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ABSTRACT 

We conduct a pilot investigation to determine the optimal combination of color and variability in¬ 
formation to identify quasars in cu rrent and future multi-epoch optical surveys. We use a Bayesian 
quasar selection algorithm (Richards et al. 2004) to identify 35,820 type 1 quasar candidates in a 


239 deg 2 field of the Sloan Digital Sky Survey (SDSS) Stripe 82, using a combination of optical pho¬ 
tometry and variability. Color analysis is performed on 5-band single- and multi-epoch SDSS optical 
photometry to a depth of r ~ 22.4. From these data, variability parameters are calculated by fit¬ 
ting the structure function of each object in each band with a power law model using 10 to > 100 
observations over timescales from ~ 1 day to ~ 8 years. Selection was based on a training sample 
of 13,221 spectroscopically-confirmed type-1 quasars, largely from the SDSS. Using variability alone, 
colors alone, and combining variability and colors we achieve 91%, 93%, and 97% quasar complete¬ 
ness and 98%, 98%, and 97% efficiency respectively, with particular improvement in the selection of 
quasars at 2.7 < z < 3.5 where quasars and stars have similar optical colors. The 22,867 quasar 
candidates that are not spectroscopically confirmed reach a depth of i ~ 22.0; 21,876 (95.7%) are 
dimmer than coadded i-band magnitude of 19.9, the cut off for spectroscopic follow-up for SDSS on 
Stripe 82. Brighter than 19.9, we find 5.7% more quasar candidates without confirming spectra in 
sky regions otherwise considered complete. The resulting quasar sample has sufficient purity (and 
statistically correctable incompleteness) to produce a luminosity function comparable to those deter¬ 
mined by spectroscopic investigations. We discuss improvements that can be made to the process 
in preparation for performing similar photometric selection and science on data from post-SDSS sky 
surveys. 

Keywords: catalogs, galaxies: active, surveys 


1. INTRODUCTION 

Identification of large numbers of quasars/active galac¬ 
tic nuclei (AGN) over a broad range of redshift and lumi¬ 
nosity is crucial for many science projects. Work that re¬ 
quires object densities higher than have been provided to 
date by spectroscopic surveys includes cross-correla ting 
the catalogs with the co smic microwave background (Gi- 
annantonio et al. 2008) to constrain dar k energy; using 
quasa rs to measure cosmic magnification (Scranton et al. 
20051; finding binary quasars whi ch can be used to tes t 
the merger hypothesis of quasars (He nnawi et al.||2010 ); 
finding gravitationally lensed quasars (Oguri et al. 2006); 
constraining quasar e volution ([M yers et al.|2U06|); study- 
ing dust in galaxies (Menard et al. 2010); and broader 
cosmological studies ( Leistedt et al.||2013| . 

Historically, quasar candidates have been identified by 
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virtue of their colors, variability, and (lack of) proper 
motion- but generally not through all of these meth¬ 
ods combined. The standard way of identifying large 
numbers of candidate quasars is to make “color c uts” 


et al. 

2002 

et ai. 

2004 

et al. 

2013 


.g., Ri chards 
Croom et al.|2004 Warren et alT| 20UU ' Lacy 

Ster n et al ||2005| Maddox et al. J2012 Asset’ 


2013). This is because the majority of unobscured 


quasars at z < 2.5 are much bluer than the majority of 
stars in the optical and are much redder in the infrared. 
However, this process is neither complete (identifying all 
true quasars) nor efficient (minimizing false positives). 
Such methods do an effective job of identifying a large 
number of interesting objects with relatively little effort; 
however, better methods are needed to scale to future 
surveys in a way that allows scientific analysis without 
the need for spectroscopic confirmation. 

In addition to classification by color, time-domain data 
make variability a promising way for classifying objects. 
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ham et al. (2014). Specificai 
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Many current a nd futur e as tronomical imaging sur¬ 
veys (Sk yMapper: |Keller et al] |2007; Palo mar Transient 
Facto ry: |Law et al.||2009l Fan-STARRS: Kaiser et al. 
2010| DESJThe Dark Energy Survey Collaboration|2005| 


T: [Ivezic et al.|2008|) are focusing on time-domain as¬ 
tronomy and in anticipation it is important to determine 
the effectiveness of classification using variability infor¬ 
mation. These surveys will observe areas of sky many 
times. There is great hope that variability selection will 
fill in the gaps in color selection methods (or replace 
color selection entire l y). Indeed, investigatio ns su ch as 


Schmidt et al. (2010), MacLeod et al. (2011), and 


ler fc Bloom| (|201l|) have been quite successful. How- 


But- 


ever, variability-only selection suffers from its own set 
of problems. For example, high-redshift quasars can be 
lost when using a fixed observed-frame variability analy¬ 
sis: Lya absorption reduces the quasar continuum in blue 
bands and the redder bands have larger photometric er¬ 
rors for fainter objects. In addition, variability incr eases 
with lower luminosity (e.g., Vanden Berk et al.|2004), but 
so does the host galaxy contribution—potentially compli- 
cating selection of such objects without careful difference 
imaging to remove the host galaxy contribution. Thus it 
is important to investigate how well variability selection 
works by itself versus being combined with other meth¬ 
ods (e.g., colors and astrometry). 

The premise of this project is to simultaneously use 
the distinctive and quantifiable characteristics of color 
and variability to distinguish quasars from stars and in- 
active gala x ies. T he Sloan Digital Sky Survey (SDSS; 
York et al. 2000) repeatedly imaged a 2.5° eq uatorial 
section of the sky referred to as Stripe 82d ( Aba zajian 
et al. 2009 Annis et al. 2014| Jiang et al. |20I4 T The 
light curves of spectroscopically confirmed quasars and 
stars from Stripe 82 give us the information we need to 
develop and test classification of quasars. 

The specific goal of this project is to use color, vari¬ 
ability, and astrometric data in combination with mod¬ 
ern machine learning techniques to uncover previously 
unidentified quasars in the SDSS Stripe 82 region and to 
pave the way for improved multi-faceted selection in the 
future. The goal is not necessarily to produce the most 
complete or efficient catalog possible, but to test the com¬ 
bined use of colors and variability data in classification. 
In this pilot investigation we make some simplifications 
to the process that will be explored in more detail in fu¬ 
ture work. Specifically, we concentrate on point sources 
to avoid the problem of the host galaxy washing out the 
variable nucleus (reducing our sensitivity to low-redshift 
quasars), we utilize a simple power-law model of variabil¬ 
ity as opposed to more sophisticated (but not necessar¬ 
ily “correct”) models such as the damped random walk, 
we use variability data from each band separately in¬ 
stead of merging them together, and we take a simplistic 
approach to combining photometric redshift information 
from different methods. Each of these simplifications for 
this pilot study is worthy of their own separate investi¬ 
gation to determine how to best deal with these issues. 

A shortcoming of the traditional quasar identification 
process is that it usually involves selecting quasar can¬ 
didates by identifying them as outliers using cuts in the 
observed data space (e.g., selecting all point sources with 

9 sdss.org/legacy/stripe82.html 


u — g < 0.6). Our classification instead makes simul¬ 
taneous use of all of the data types available and uses 
modern statistical techniques (based on kernel density 
estimation; KDE) to make cuts in probability space (e.g., 
objects with an expected quasar probability greater than 
50%). We will extend the methods developed by our 
group jRicha r ds et al.|2004l|Riegel et al.|(|2008|)|Richards 
et al.||2009a | Hichams et al. 2009b|) and~bthers (e.g. , 

Suchkov et a .|2005| Ball et al. 2006 ' Davoodi et al.|2006 


Gao et al. 2008[ Bailer-Jones et al. 2008;; D’Abrusco et al. 

2009; Guy et al. 2010 

Schmidt et al. 2010; Abraham 

et al. 2012; Bovy et a 

. 2012 

Peng et al. 2012} Gupta 


domain focused sky surveys. While thi s approach has 


been shown t o work well in the past (e.g., Richards et al. 


2004 2009a), in future work we also intend to explore 


other modern statistical techniques such as described by 


Feigelson & Babu (2012) and references therein. 

The quasar candidates that result from application of 
this method are only identified photometrically; they lack 
spectroscopy which not only would confirm the type of 
an object, it crucially also would determine the redshift. 
There are many sophistic ated metho ds for estimating 
photometric redshifts (e.g. Rowan-Robinson et al. 2008 


Salv ato et al. 2009); we u se the algorithm descr ibed in 
Richards et al. 120011 and Weinstein et al. (2004) which 
ranks among the most accurate for (luminous) quasar 
photometric redshift estimates. We improve this pro¬ 
cess further by using the effective prismatic effects of 
the Ear t h’s atmos ph ere as a low-resolution spectrograph 
(Kaczmarczik et al. 2009). In short, the positions of 
quasars, with their strong emission features, is a func¬ 
tion of pass band and redshift. This behavior of quasars 
allows us to uniquely incorporate astrometric informa¬ 
tion into our photometric redshift estimates. 

Our work provides a stepping stone for quasar classifi¬ 
cation for future surveys such as the Large Synoptic Sur¬ 
vey Telescopcp^(LSST). Eventually, each region of LSST 
will be imageaabout 200 times in each filter over the 10 
years of the survey, allowing for study of the variabil¬ 
ity of the object on scales of minutes to a decade. This 
focus on time-domain astronomy is an exciting new era 
in surveys, but the lack of spectroscopy creates a prob¬ 
lem for confirming the type of an object. As the number 
of spectroscopic fibers allocated to quasar identification 
pales in comparison to the number of photometrically 
detected objects that merit spectroscopic follow-up, it 
is only through complete and efficient object classifica¬ 
tion coupled with accurate redshift estimates that we can 
overcome the lack of spectroscopy in LSST and other 
future astronomical surveys and maximize their science 
output. 

The layout of this paper is as follows. In Section [2] 
we introduce the SDSS Stripe 82 data that we will use. 
We then describe how the variability parameters used for 
classification are calculated. In Section [3] we summarize 
the NBC KDE selection algorithm and describe how it 
is used in this case. We test the various classification 
parameters and determine the optimal combination in 
Section |4j Then, in Section [5] we build the quasar candi¬ 
date catalog using these optimal parameters, first using 
the full quasar training set, then using the training set 

10 lsst.org 
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Table 1 

Master Quasar Catalog 


Source 


Description 


w/ spectra w/o spectra Training Set 


Table 5 from 

Schnc 

ider et al. 1 (|2010|) 

SDSS I/II 

105472 

0 

6082 

Croom et al. 

(2004 



2QZ 

9663 

0 

0 

Uroom et al. 

(2009 


) 

2SLAQ 

8881 

0 

1576 

Croom et al. 

m prep. 

AUS 

2200 

0 

1706 

Kochanek et al. (2012 

AGES 

2844 

4 

0 

JLiiiy et al. 

(2UUTI) and 

Elvis et al. (2009) 

COSMOS 

259 

0 

0 

Pan et al. 

200b) and 

Jiang et al. (ZUU8) 

z > 5.8 

27 

0 

0 

Fans et al 

(2U14) 



SDSS-III/BOSS 

168820 

0 

7383 

Ross et al. 

(2UI2) 



MMT 

836 

0 

278 

Kichards e 

al. (2009ab 

JNBCJKDE Photometric Catalog 

174663 

965542 

9061 

tsovy et al 


Ulif 



XDQSO Photometric Catalog 

142567 

682831 

7088 

table b ol 

Fa 

povich et al. (2006) 

BROADLINE objects 

104 

0 

0 

Table 5 ol 

TThkman et alT (2006 ) 

Z 4 

10 

0 

0 

Tables 4 a 

id b ol Maddox et al. (20121) 

KX-selected 

3608 

0 

986 


Tbtal - - 274329 1301846 I322T 


divided into redshift bins to perforin simultaneous classi- 
fication and redshift estimation. In Section[6]we describe 
how the astrometric parameters are calculated, then es¬ 
timate photometric and astrometric redshifts for all the 
candidate quasars. Next, we describe a cut to remove 
contamination and describe the final catalog of quasar 
candidates in Section [7] In Section [8] we compare to cuts 
in variability space and to color-based quasar selection, 
and calculate number counts and a luminosity function 
for the candidates. We discuss possible next steps in 
Section [9] and conclude in Section flol 
Cosmology-dependent parameters are determined us¬ 
ing H„ = 70 km s -1 Mpc -1 , f l m — 0.3, and S1 a = 0.7 
(Hinshaw et al. 2013|. Throughout this paper magni¬ 
tudes will be reported on the AB system of|()ke & Gunn 
(1983). 


2. DATA 


In this section, we describe the origin of the data and 
the parameters used for classification by our algorithm. 
Section |2.1| describes the imaging data and |2.2| the spec- 


troscopic data. Sections [23 and |2.4| discuss derivation 
of the color and variability classification parameters, re¬ 
spectively. In principle, we could use astrometric infor¬ 
mation for classification as well; however, for this pilot 
study we have limited astrometric data to estimate pho¬ 
tometric redshifts as discussed in Section [6] Machine 
learning algorithms need both training sets to find pat¬ 
terns in the data and a test set of data to verify that 
these patt erns are useful; these data sets are described 
in Section f2~5l 


2.1. SDSS Stripe 82 

The SDSS is an optic al s urvey that has used the 2.5- 
m Sloan telescope (Gunn et al. 2006) at Apache Point 
Obse rvatory in New Mex ico to map 14,500 deg 2 of the 


sky (Aihara et al. 2011). Ph otometry was per formed 
with a dritt-scan CCD camera (|Gunn et al.|1998|) taking 
nearly simultaneous 54.1 second exposures in hve broad 
optical bands (n, g, r, i, and z) between 3,000A and 


10,000A (Fukugita et al. 


1996). 


The imaging data used in our analysis consists of ob¬ 
jects solely from the SDSS Stripe 82 area, which were 
mad e available a s part of SDSS Data Release 7 (DR7; 
Abazajian et al. 2009) and includes observations from 


October 1999 to November 2007. The Stripe 82 region 
covers a 2.5° wide ‘stripe’ on the celestial equator from 
right ascension ~300° to ~60° in the Southern Galactic 
Cap. Repeated observations were performed on this re¬ 
gion throughout the SDSS I/II, with increasin g fre quency 
as pa rt of the SDSS Supernova Survey (Frieman et al. 
2008), with ~H00 repeat imaging scans by the end ol 
observations. The initial observations were done under 
optimal seeing, sky brightness, and photometric condi¬ 
tions. The supernova survey runs were done on useable 
nights, but under less than optimal conditions. We limit 
our analysis to those objects detected as point sources. 

The multiple observations on Stripe 82 were aligned 
and s t acked into a co added catalog desc ribe d in ( Annis 
et al. (2014) (see also Jiang et al. 2014 and Huff et al. 
2014|). This catalog uses 20 to 40 observations on the 
region, mostly the early runs under optimal conditions. 
The data were downloaded from the SDSS Stripe 82 Cat¬ 
alog Archive Server (CASTM Database entries having 
SDSS “run” numbers of lOo and 206, representing ob¬ 
jects with co-added photometry, were extracted along 
with the individual epoch photometry for each of these 
objects in order to generate light curve^J The single 
epoch images go to a depth of r ~ 22.4 (5 ct) with a 
median seeing of 1.4". Coaddition of the imaging data 
reaches ~ 2 magnitudes deeper and improves the median 
seeing to 1.1". The improvement in using coadded mag¬ 
nitudes over single epoc h ma gnitude s for classificatio n is 
demonstrated in Section 4.2 see also Ivezic et al. (2007). 


2.2. Master Quasar Catalog 


Definition of our quasar training set requires a subsam¬ 
ple with spectroscopic confirmation. Our primary source 
of spectroscopy comes from a “Master” Quasar Catalog 
(MQC), described in Section 2.1 of Richards et al. (2015, 
submitted), containing over 1.5 million sources, for which 
over 250,000 have confirming spectroscopy. This dataset 
consists of sources within the SDSS survey areas and 
draws objects from the sources described in Table [T| 
This quasar sample represents nearly every quasar 
known fainter than i ~ 16 (including candidate photo- 


11 http: / / cas.sdss.org/stripe82 / en 

12 This process has since been made some¬ 
what easier through the use of a_uni fying “ thingln- 

dex” table in Data Release 12 (] Alamet ah] |2015|): 

http: //skyserver.sdss.org/drl2/en/help/browser / browser.aspx 
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Figure 1. Quasar and non-quasar training sets in two projections of the SDSS color space using coadded photometry. Non-quasars (shown 
in orange contours), such as stars and compact galaxies, are considered contaminants when trying to accurately classify quasars (shown in 
cool colors). Notice the number of non-quasars in the region in which mid-redshift quasars (2.2 < £ < 3.5; shown as dark blue contours 
and scatter points for outliers) lie. This overlap makes it difficult to accurately classify an object in this region as a quasar or non-quasar 
and motivates searches for alternative methods of classification, like variability. Quasars are shown as three redshift regions: low-redshift 
(z < 2.2; shown as green contours and scatter points for outliers), mid-redshift, and high-redshift (z > 3.5; shown as light blue dots). The 
extension of the non-quasar color space at g — r ~ 1.4 is not real, but an artifact of including objects with large w-band photometric errors 
(and thus spilling into the true quasar parameter space). 


metric quasars) at the time of Data Release 10 (DRIP; 


Ahn et al. 2014) of SDSS-III (Eisenstein et al. 2011 Daw- 


son et al. (|20I3 )). The majority ot the confirmed quasars 
come from the 


scribed in detail by 

Richards et al. 

(2002 

I anc 
SS qi 

Schneider 

et al. (2010) and from the SDSS-l 
log, which is described in detail by 

11/BO 

rasar cata- 

Ross et al. 

(2012 

) and 


Paris et al. (2014). 


The SDSS 1/11 quasars were primarily color selected 
(with some radio and X- ray selection ) ov er a b road red- 
shift range (0 < 2 < 5). Richards et al. (2002) describe 
the quasar target selection of the main quasar survey, 
which went to i < 19.1 for quasars with colors consis¬ 
tent with 2 < 3 and to i < 20.2 for quasars expected 
to be at higher redshifts. On Stri pe 82, deeper targeting 
was performed (Adelman-McCarthy et al. p006| ) going to 
i = 19.9 and i = 20.4, respectively, in targeting “chunk” 
22; to i = 20.2 (for low-redshift sources) and i = 20.65 
(for radio sources) in targeting chunk 48; and to i < 21 
for sources more variable (between two epochs) than 3er 
(and 0.1 mag) in both g and r in targeting chunk 73. 


The BOSS quasars (focused on 2.2 < 2 < 3.5.; Ross 


et al.p012) were, in addition to color selection, also tar¬ 


geted^T5y~variability (on S tripe 82). This variab ili ty se- 
lection is described in Palanque-Delabrouille et al. (2011) 
and uses an algorithm that was also based on the same 
para met erization of variability as used herein (see Sec¬ 


tion 


2.4). Thus it is interesting to see if our method 


finds additional quasars beyond those already spectro¬ 
scopically confirmed. Quasar candidates in our catalog 
that are previously known from SDSS-I/II and SDSS-III 
spectroscopy are indicated as such in our catalog; see 
Appendix |A| 


2.3. Classification Parameters: Colors 

The optical color information used in our analysis con¬ 
sists of the four adjacent SDSS colors (u — g, g — r, 
r — i, and i — 2 ), which were determined from the cat¬ 
aloged photometry using point-spread-function ma gni¬ 
tudes , corrected for Galactic extinction (Schlegel et al. 
1998]). We used both single-epoch colors, from a single 
obs ervation of the obj ect, and the coadded colors, from 
the Annis et al. (2014) catalog. 

The level of contamination from stars and galaxies 
varies significantly in various regions of colorspace; see 
Figure [l] Optical surveys for quasars often use rela¬ 
tively simple color cuts (drawing lines of demarcation 
in these color spaces) to select objects that are likely to 
be quasars. In SDSS, outliers from the stellar locus in 
the color space were potent ial spectroscopic target candi¬ 
dates (Richards et al. [2002). The ugri bands were used to 
identify low-redshift quasars and the griz bands for high- 
redshift quasars. For low- and high-redshift quasars, se¬ 
lecting by colors is effective, but mid-redshift quasars 
(2.2 < 2 < 3.5) occupy the same region of color space as 
many stars and contamination becomes a serious prob¬ 
lem. Note how the mid-redshift quasars, shown as dark 
blue contours and scatter points in Figure [l] overlap with 
the non-quasars, shown as orange contours. It is most ef¬ 
ficient to choose quasars outside of this redshift region for 
spectroscopic follow-up, but this creates a strong selec¬ 
tion effect in the quasar sample. For efficient selection 
of mid-redshift quasars, it becomes necessary to have 
another method to distinguish the quasars from non¬ 
quasars and this is where the variable nature of quasars 
becomes particularly useful. 
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Figure 2. g and u-band light curves ( left, panel) and g-band structure function fit with a power law model (right panel) of SDSS J013417.81- 
005036.2, a redshift 2.26 quasar from SDSS Stripe 82 (also shown in Figure ||2[ I. This quasar is shown as an example representative of 
the data set. Left panel: There are 126 observations in the g-band. The 106 observations that meet the PSF-width and the airmass 
requirements are shown as green points with error bars, while those that were removed are shown in orange. The dark green dashed line is 
the running median (with a window of 50 days and steps of 5 days) calculated from the g-band observations. The orange dot was removed 
from the light curve because it is more than 0.25 magnitude from the median. The u-band observations are similarly shown in blue and red. 
Right panel: The pairs of photometric points from the g-band light curve in the left panel are shown as a hex-bin density plot where the 
darkness of the hex bin indicates the number of points in that bin. The power law fit is shown a s a g reen line. The method for calculating 
the structure function and the equation used to fit the structure function are detailed in Section |2.4| In the case of this object, the fitting 
algorithm gives A g = 0.105 and y s = 0.102. The points removed as outliers in the left panel would only contribute |Am| > 0.25 mag 
values. 

There are other methods currently being used to 
characterize the variability of quasars inclu ding Slepian 
wavelet variance (SWV; Graham et al. 2014), AutoRe- 


2.4. Classification Parameters: Variability 

Most quasars vary at optical wavelengths by about 10% 
over several years, which di stinguishes them from most 


normal galaxies and stars (de Vries et al. 2003 Van- 


den Berk et al.|[2004). Mo st variable stars vary periodi- 


cally"or quasi-penodically (Richards et al. 2012) and with 
smaller amplitu de, but quas ars generally sho w no peri - 
odic variability (Bailer- Jones||2012 Andrae et al. 2013). 
While the physical causes t or the variability in quasars 
are not well understood (see|Dexter & Agol 2011 for a re¬ 
cent investigation), the nature of the variability enables 
one to distinguish quasars from non-quasars. 

We use the structure function to characterize variabil¬ 
ity by quantifying the amplitude of variability as a func¬ 
tion of the time difference between paired observations. 
For this analysis, based on empirical experiment (balanc¬ 
ing the number of epochs with the quality of the data), 
we required that the FWHM of the PSF fit in the r band 
be less than 2" and the airmass in the r band be less 
than 1.575 for the observation to be included. These 
cuts remove approximately 15% of observations. After 
this procedure, we found that a small number of non- 
astrophysical outliers in the light curve still must be re¬ 
moved; these points are such strong outliers that we are 
not concerned that removing them is compromising the 
variab ility analysis. Similar to the approach in |Schmidt| 
et al. (2010), we accomplish this by calculating a run¬ 
ning median light curve then removing all measurements 
with a difference between the median light curve and the 
observed magnitude greater than 0.25 magnitudes (Fig¬ 
ure ^ left panel). The structure function is calculated 
in all of the SDSS bands where at least 10 observations 
remain after these cuts. 


gressive Mo ving Average, or ARMA, processes (K asliwal 


2009 


et al.|2015), or damped random walk (DRW; Kelly et al. 


Kozlowski et al.|2010 ) . Future work could consider 


using these methods instead of the structure function. 

In our work, the structure function is defined as the 
root mean square magnitude difference as a function of 
time lag between magnitude measurements: 


V 2 (At) = (( m(t ) — m(t + At)) 2 


( 1 ) 


In the above equation, m(t) — m(t + At) is the mea¬ 
sured magnitude difference between two observations in 
a given band and At is the time difference between the 
two observations in the observer’s frame. The SDSS has 
a high cadence of observations during the fall months 
each year and then gaps of ~9 months before the next 
set of observations. This irregular sampling in the light 
curve (Figure [2 \leftp anel) results in a structure function 
with gaps (Figure p] right panel). 

The structure functi on can be modeled as a power law 
(Equation 3 in Schmidt et al.||2010): 


Vp 0 werLaw[At\ A, 'y) — A 


At 

lyear 


( 2 ) 


Such a parameterization is not effective at describing the 
underlying type of variability or the mechanism for it, 
but provides a sufficiently robust statistical description 
for th e timescales (~ 1 day to ~ 8 years) covered by our 
data (Schmidt et al. 2010) to distinguish variable sources 
from non-variable sources, which is our objective. Using 
this model for the structure function, we find that 93% 
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of quasars are more variable than non-variable stars on 
average (using white dwarfs as representative) and show 
more growth in variability at longer time scales than 80% 
of non-quasar point sources. 


The variability can also be modeled as a DRW (Kell; 


_„_,_ ly 

et al.||2009 Kozlowski et al.||2010 MacLeod et al.|| 2010 |). 

which predicts the following form of the structure lunc- 

tion: 


V DRW (At\a,T) = V2a (l - e At/r ) 2 . (3) 

To first order in At, the DRW behaves as: 

/AA ^ 

V DR w(At\a,T) ~ y/2a (—J , (4) 

a realization of Equation [2] where 7 = 1/2. In short, 
the DRW model is similar to the power-law model ex¬ 
cept that it truncates the growth of the magnitude dif¬ 
ferences at some characteristic timescale. For the sake 
of this proof of concept, the power law model will suf¬ 
fice and is what we shall use hereafter. In future work 
we will investigate whether a more sophisticated model, 
such as the DRW model, improves quasar selection; how¬ 
ever, even that model may be too simplistic to describe 
quasar variability across the range of ti mescales p robed 


by modern optical monitoring data (Mushotzky et al. 

2011 

Zu et al. 2013j Graham et al. 2014| Kasliwal et al. 

2015 

To 

for e 

)• 

fit the power law model to the observational data 
ach object we used the likelihood function (Equation 

4 in 

Schmidt et al. 2010): 


£(A7 ) = LRw ( 5 ) 

j,k 


where Lj k is the likelihood of observing one particu¬ 
lar magnitude difference A rtij k between two light curve 
points separated by A tj ^. To determine the maximum 
likelihood of a Gaussian distribution, as in the case of the 
noise and intrinsic photometric variability, the likelihood 
function is: 


N 






-exp 


7TCT, 


( i (a mi y - \ 

V 2 of ) 


( 6 ) 


The variance a 2 = ( A(tj - 4) 7 ) 2 + v P hot,j 2 + cr p hot,k 2 
represents the scatter around the line that we are fitting 
and includes both intrinsic variability and noise. The 
Cphotj and a p hot,k are the measured photometric errors 
on the measurements. Both the noise and the intrinsic 
photometric variability are assumed to have a Gaussian 
distribution. 

If there is no variability or measurement noise, the 
structure function would be equal to zero for all At. The 
likelihood function now has the form: 


c = T\ 

~y k yy2TT[(A(tj - tfc) 7 ) 2 + VphotJ 2 + CTphot,k 2 ] 

(■rrij - m k ) 2 


ex P -o 


2 {A(tj T &ph 0 t,j 2 “b &phot,k' 


(7) 


data pairs where n is the number of observations. We 
require the fitting to return physical values, A > 0 and 
7 > 0 , so that the power law exponent and the average 
variability on a 1-year timescale are positive. This is be¬ 
cause we are fitting |Am| and | At| and all light curves will 
have some level of measurement noise, causing A > 0. 
Non-variable stars generally have 7 approaching 0. The 
expected increasing deviation from the mean for quasars 
with increasing |At| will cause 7 > 0. 

We found a strong degeneracy between A and 7 when 
maximizing the likelihood. To break this degeneracy, we 
applied a Gaussian prior to the likelihood on A. With 
a typical observing cadence of ~1 year, the prior is cen¬ 
tered on the observed median | Ato| value, A, at 0.5 years 
< | At | < 1.5 years and the standard deviation, a a, for 
those values. We place no explicit prior on 7 in the likeli¬ 
hood, but the requirement that 7 > 0 functions as a flat 
prior. In addition to breaking the degeneracy, this prior 
encourages the minimization routine to converge on a re¬ 
alistic A value more quickly. The cadence of the Stripe 
82 data gives sufficient data points over this time differ¬ 
ence to support this constraint. We combine the log of 
the likelihood function and the prior as follows: 


S =-2-log(C) + P{A) 


n(n 1)^ [l°g((^fe - t k ) 7 ) 2 + <Jj 2 + <Jk 2 ) (g) 

^ ' j>k 


(nij -m k y 


( A(tj - 4) 7 ) 2 + <Jj 2 + <J k 2 _ 


+ 


(A-A) 2 


where N is the number of terms in the sum and P(A) is 
the prior on A. 

The posterior probability is maximized, by minimiz- 
inj^j Equation 8 (the negative of the posterior probabil- 
ityj%for each object in each of the five bands, so that 
for each object there are now ten variability parameters 
that can be used for classification: A u , 7 U , A g , j g , A r , 
7 r , Ai, 7 j, A z , and j z . Figure [3] shows an example for 
the 5 -band variability parameters; note that the different 
redshift ranges are well mixed (but are largely distinct 
from non-quasars) in this case. In practice, our imple¬ 
mentation of the likelihood method is biased (10 - 20 % 
in the best-fit values) which becomes relevant when light 
curves are much better sampled than those discussed 
here. An approach such as that de scribed in the ap¬ 
pendic es of 
2015b wou' 


Kozlowski et al. (2010) or Hernitschek et al. 
d be more robust. However, for the sake of 
this pilot investigation, our approach is more than suf¬ 
ficient, particularly because any bias in the variability 
parameters is the same for both selection by variability 
only, and by combined color and variability selection. 

We currently fit the structure function to the multi¬ 
epoch data for all bands separately to compare their 
performance in the NBC KDE selection algorithm (see 
Section |3|. However, there are several ideas on how best 
to combine the observations in all five bands to obtain 
one light curve and one structure function to describe 
the overall variability. These methods are complicated 
by differences in how quasars vary in the different bands. 


The product only counts those observations where 
j > k, so there is no double counting and there are 


13 Using Scipy’s Optimization package, Powell’s method: 

scipy. optimize. fmin.powell. 
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Figure 3. Quasar and non-quasar training sets in variability pa¬ 
rameter space for the g-band observations. Note that, unlike in 
the color-color plots in Figure IT] there are no distinct changes in 
the variability parameters as a function of quasar redshift in this 
parameter space. This is advantageous because it allows us to sepa¬ 
rate the quasars from the non-quasars in the variability space with¬ 
out extreme changes in completeness at specific redshifts, as seen 
with color selection. Non-quasars, such as stars and normal galax¬ 
ies, are shown in orange contours. Quasars are shown in cool colors 
as three redshift regions: low-redshift (z < 2.2; shown as green con¬ 
tours and scatter points for outliers), mid-redshift (2.2 < z < 3.5; 
shown as dark blue contours and scatter points for outliers), and 
high-redshift (z > 3.5; shown as light blue dots). 


For example, different bands represent different distances 
in the accretion disk resulting in a time lag between the 
bands and different characteristic timescales. 

As shown in Figure [4j there are different amplitudes 
of variability in different bands. Additionally, Lycc ab¬ 
sorption obscures the true variability of quasars at high 
redshift. This is quite apparent in the u-band (top left 
panel ) where the measured variability parameters for 
high-z quasars are caused by the high photometric errors 
of the zt-band dropouts. It is also recognized that quasars 
becomejnore luminous as they become bluer (Schmidt et 
al. 2010 and 2012|) and that bluer quasars in general are 


more variable (Vanden Berk et al. 2004 


20101. Both of these effects must be taken into account 


1 
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when combining observations to describe the overall vari¬ 
ability. A further complication for LSST will be the 
non-simultaneity of the observations in different bands. 
Thus, proper treatment of the combined variability data 
is complex and beyond the scope of this paper. For our 
purposes, describing the variability in each band is suffi¬ 
cient, and we therefore proceed with fitting the structure 
function for each of the bands separately. 


2.5. Test Set and Training Sets 

Now that we have described the data inputs to our 
algorithm we can formally define the test and training 
sets. The test set begins with all stellar morphology 
(objc_type == 6) objects on the SDSS Stripe 82 with 
observations in DR7. Restricting our sample to point 
sources allows us to concentrate on the improvements 
gained by combining colors and variability without hav¬ 
ing to worry about the differences in color and variabil¬ 


ity at redshifts and luminosities where the host galaxy 
contributes significantly to these properties. This set of 
observations was then limited by the following criteria: 
—40° < RA < 55°, g-i < 6.0, g < 23.5, i < 22, 
G g < 0.5, and < 0.33. These cuts are intended to re¬ 
duce scatter due to high stellar density near the Galactic 
plane, high dust obscuration, and non-astrophysical col¬ 
ors. Observations with flags indicating poor ph otometry, 
such a s those discussed in Section 3.2 of IRichards et al.l 
(2002) were also excluded. There are 1,163,174 objects 
with 49,274,136 observations that meet these cuts. 

Only objects where we had sufficient observations to 
calculate variability parameters in all five bands and as¬ 
trometric parameters in u and g were included in the 
test and training sets. Additionally, we require coad¬ 
ded colors —1.0 < u — g < 9.0, —0.75 < g — r < 2.5, 
—0.5 < r — i < 3.0, and —1.5 < i — z < 1.75, to con¬ 
strain the parameter space for the NBC KDE to limit 
the necessary computational time for objects with un¬ 
usually deviant colors. After these cuts, 916,587 objects 
remain. These objects compose the cleaned data set. The 
test set consists of the 903,366 sources from the cleaned 
data set that have not been spectroscopically identified 
as quasars. 

The quasar training set is formed from the 13,221 spec¬ 
troscopically confirmed quasars in the MQC that have 
matches in the cleaned data set. To keep computational 
time reasonable, we select a subsample of 72,680 non¬ 
matches for the no n-quasar training set. As with our 
previous work (e.g., Richards et al. |2009a l, we note that 
the vast majority of these non-quasar training set ob¬ 
jects are not actually spectroscopically confirmed to be 
non-quasars and thus there will be some level of contam¬ 
ination as is discussed further in Section |3] We do not 
explicitly include or exclude spectroscopically confirmed 
stars or galaxies in the non-quasar training set as most of 
these were selected as quasars (and found to be contam¬ 
inants) and are thus biased in their color-space distribu¬ 
tion. In practice, when we run the classification on the 
test set we include the training set objects so that our 
catalog of candidate objects includes the known quasars, 
making it easier to determine our completeness of these 
sources. 


3. NBC KDE ALGORITHM 

Using training sets described in Section |2.5[ classi¬ 
fication of the test s et o bject s (b ased on parameters 
described in Sections 12.31 and |2.4[ ) was performed us¬ 
ing Non-parametric Bayesian Classification (NBC) based 
on applying Kernel Density Estima t ion (KDE) to selec t 
qua sars; see Richards et al. (2004), Gray et ah] ( |2005 ) 


and Riegel et al. 1 2008p . The algorithm takes training 
sets of objects divided into quasars and non-quasars. It 
creates an N-dimensional probability space for each of 
the classes, where N is the number of parameters that 
describe each type of object and the parameter space 
is nor malized to give equal weight to each parameter 
(Gray et al. 2005). A probability density function (PDF) 
is constructed for each class of objects using KDE, by 
representing each individual object within a class by an 
N-dimensional Gaussian distribution and summing to¬ 
gether the result for each object. Using the NBC KDE 
selection algorithm it is possible to combine all the clas¬ 
sification parameters ( u — g , g — r, r — i, i — z , A u , y u , A g , 
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Figure 4. All spectroscopically confirmed quasars shown in A vs. 7 space in each of the SDSS bands, colored by redshift. Shown to 
demonstrate the difficulty involved in combining the observations in all five bands to obtain one light curve and one structure function in 
order to describe the overall variability without previously knowing the object’s redshift. Note how the distribution of points shifts with 
band and with redshift. In particular, A and 7 values agree well in the g, r, and i band, but the large photometric errors in u and 2 : bands 
artificially increase the apparent amplitude of the variability. 


7 g , A r , 7 r , Ai, 7 ^, A z , and y 2 ) and perform the classifi¬ 
cation simultaneously considering all the characteristics 
to determine if the object is a quasar or a non-quasar. 

From this PDF, the probability of an unclassified ob¬ 
ject being a quasar or non-quasar can be calculated, but 
first we need an understanding of the real-world ratio of 
quasars to non-quasars. When a new point is placed in 
the PDF, the probability of it being a quasar or a non¬ 
quasar is weighted by its prior probability. This prior is 
an expectation of how many of the unknown objects are 
non-quasars. This weighting is an application of Bayes’ 
Theorem: 


P(M\D,I) = 


p(D\M,I)P(M\I) 
P(D\I) ■ 


(9) 


In Equat ion [9| Bayes’ Theorem (Bayes |1763 Ivezic 
al.j2014 Chapter 5), D stands for data, M for modef, 


et 

and 1 for prior information. This relates the posterior 
for the model based on the likelihood given the data and 
a prior. The pair of multi-dimensional weighted PDFs 
measures the probability of an unknown object being a 
quasar or a non-quasar, while taking into account the 
expected ratio of quasars to non-quasars, and classifies it 
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accordingly. Throughout this work we use a prior of 0.95, 
meaning that we expect 95% of the objects to be non¬ 
quasars. The lower limit for the prior is determi ned b y 
the fraction o f known quasars in the test set. In|Richards| 


et al.| (f2009aj) the ratio of quasar candidates to the test 
set was 2.67o. We use a slightly lower prior to capture 
some of the quasars that Richards et al.| (f2009a) did not. 
We assumed the prior to be independent of position on 
the sky and magnitude. Changing the prior by 1% does 
not change the number of quasar candidates by 1 % of 
the test set, but changes the number by roughly 1% of 
the quasar candidates (Richards et al. 2015, submitted). 

The algorithm requires a bandwidth for each of the 
training sets. The bandwidth controls the width of the 
kernel (a Gaussian distribution in our case) used to build 
the KDE. R is important to choose an optimal bandwidth 
when calculating the KDE or the distribution will be too 
smooth (under-fit) or will be too structured (over-fit)— in 
the same way as choosing an incorrect bin size for a his¬ 
togram. The optimal bandwidth was found by perform¬ 
ing leave-one-out cross-validation (leaving one object out 
and using the remainder of the training set to classify) 
over a range of bandwidths. We also refer to this as a 
self test. 

This process was repeated to find the optimal band¬ 
width based on the product of completeness and effi¬ 
ciency. Completeness is defined as the number of known 
quasars correctly classified as quasars divided by the 
number of known quasars. It is also referred to as sen¬ 
sitivity. Efficiency is defined as the number of known 
quasars correctly classified as quasars divided by the 
number of objects (known quasars and non-quasars) clas¬ 
sified as quasars. It is also referred to as purity. Different 
metrics could be chosen depending on the desired science 
and whether completeness is needed over efficiency, but 
we use the product of completeness and efficiency as a 
middle ground for this proof of concept. That is, an ef¬ 
ficiency of 85% and a completeness of 70% is considered 
a better selection than efficiency of 99% and a complete¬ 
ness of 55%. 

After an initial self-classification of the training set is 
done, all those objects in the non-quasar training set 
that were classified as quasars in the self test are re¬ 
moved. This process is expected to remove the majority 
of quasars that may have contaminated the non-quasar 
training set due to lack of prior spectroscopic confirma¬ 
tion. This new “cleaned” non-quasar training set is used 
for the final classification. This cleaning process is a sin¬ 
gle iteration process and is performed separately for each 
of the classifications that we attempt below. 

Having established the quasar prior probability, the 
quasar training set, a “cleaned” non-quasar training set, 
and the bandwidths for each of the training sets, we can 
proceed to classification of the unknown sources (i.e., the 
test set). Application of the NBC KDE algorithm results 
in each object receiving a binary quasar vs. non-quasar 
classification, bifurcated at P(M\D, I) = 0.5. In the fu¬ 
ture, it may make more sense to simply output a proba¬ 
bility for each object to facilitate combining this informa¬ 
tion with other data, but for the sake of this pilot study, 
we have chosen to make a hard cut (but in probability 
space rather than color space). 

We explore which set of parameters (color, variability, 
or both) produces the best results in Section [I| then we 


will apply the algorithm to the test set to obtain a set of 
quasar candidates in Section [5j 

4. TESTING CLASSIFICATION PARAMETERS 

Our goal is to establish whether combining color and 
variability information in quasar selection is superior to 
using just colors or variability alone. To accomplish this 
goal, the NBC KDE algorithm was used in a series of self 
tests, which consists of performing leave-one-out cross- 
validation on the training sets (rather than on a test 
set). The object being classified is not included in the 
training set and the process is repeated for each object 
in the training sets. The classifications returned by the 
algorithm are compared to the known classifications of 
the objects to estimate the completeness and efficiency 
of selectio n us ing those particular input parameters. 

Section |4.1| uses the NBC KDE algorithm with the 
above quasar and non-quasar training sets to perform 
a self test using colors alone. This process serves as our 
basis of comparison: do other parameters enable more ro¬ 
bust quasar selection than colors alone? In Section |4.2| 
we attempt variability-only classification along with com¬ 
bined color and variability classification. We then com¬ 
pare the results of these self tests. This process reveals 
which variability (and color) parameters yield the most 
robust classification. 


4.1. Classification Using Color 

Our first self test was performed using only the single¬ 
epoch SDSS adjacent colors (u — g, g — r, r — i, i — z) as 
inputs to the algorithm. In practice, we chose a random 
epoch (meeting our requirements for good photometric 
and astrometric data) for each object. Using single epoch 
data is the most fair comparison for the majority of the 
objects in the SDSS footprint and we can use this as a 
control to compare how our method improves selection 
by adding variability. We could have chosen the ‘best’ 
epoch for optimal classification by single-epoch colors 
alone; however, as we are testing the improvement from 
adding variability to the color classification, any epoch 
with quality data will serve. 

The results of the classification are shown in Table [2j 
row 1 , which indicates that these parameters are suc¬ 
cessful at not classifying non-quasars as quasars, at the 
expense of missing more than 37% of known quasars. 
Indicative of the well-known problem of separating high- 
redshift quasars fro m the locus of modera te-to-cool tem¬ 
perature stars (e.g., Richards et al.|200 2), most of these 
missingquasars are at high redshift as can be seen from 
Figure[5j On the other hand, low-redshift quasars, which 
can be selected robustly by traditional color cuts, are also 
ea sily identified using th e NBC KDE algorithm as shown 
in|Richards et al.| (|2004|). 

The completeness of our single-epoch sele ction is dis¬ 
tinctly different from Richards et al. (2006): it is seem¬ 
ingly too high at low-z (given our restriction to point 
sources) and too low at high-z. For low-z this merely re¬ 
flects the completeness of point sources. At high-z it 
is important to realize that in Richards et al. (20061 
the purpose was to perform as complete a selection as 
possible, with efficiency as low as 50%, using hard color 
cuts. We will discuss how complete our selection is for 
all quasars, including extended sources, in Section [ 8 ] 
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Figure 5. Fraction of quasars correctly classified as quasars (completeness). These panels demonstrate that we are able to separate the 
quasars from the non-quasars in the variability space without extreme changes in completeness at specific redshifts. In both panels the 
gray line shows the number of quasars in each bin (right axis) and light blue (single epoch) and peach (coadded epochs) histograms show 
the completeness of color-only selection (left axis, Section |4.1[ ). Note the catastrophic loss of high -z quasars from single-epoch colors and 
the incompleteness at z ~ 2.8 even for coadded colors. We also show classification from variability only: single bands (left panel ) and 
combinations of multiple bands (right panel). The g , r, and i bands are shown as blue, green, and orange lines respectively. There are no 
dramatic drops in the g—, r—, or i —bands variability at distinct redshifts, just a gradual decline with increasing redshift, which is related 
to observed magnitude, signal to noise ratio, and time scale of variability in the observer’s frame. The overall completeness using variability 
alone is not as high as coadded colors alone at low redshifts, but is more successful than single-epoch colors alone at high redshifts. 


Table 2 

NBC KDE Results - Self Test Non-quasar and Quasar Fraction 


Self Test 

non-quasars as non-quasars 

quasars as quasars 

correct 

total 

traction 

correct 

total 

traction 

single epoch colors 

68611 

69566 

0.986 

8232 

13221 

0.623 

coadded colors 

69474 

69738 

0.996 

12353 

13221 

0.934 

u variability 

70970 

71936 

0.987 

5550 

13221 

0.420 

g variability 

69489 

70040 

0.992 

11138 

13221 

0.842 

r variability 

69998 

70476 

0.993 

11137 

13221 

0.842 

i variability 

69935 

70397 

0.993 

10782 

13221 

0.816 

z variability 

70665 

71372 

0.990 

5403 

13221 

0.409 

g & r variability 

69777 

70054 

0.996 

12060 

13221 

0.912 

r Sz i variability 

69714 

70050 

0.995 

11933 

13221 

0.903 

g, r, 8z i variability 

69728 

70034 

0.996 

12150 

13221 

0.919 

coadded colors; u variability 

69644 

70077 

0.994 

12311 

13221 

0.931 

coadded colors; g variability 

69822 

70114 

0.996 

12739 

13221 

0.964 

coadded colors; r variability 

69912 

70229 

0.996 

12741 

13221 

0.964 

coadded colors; i variability 

69880 

70157 

0.996 

12634 

13221 

0.956 

coadded colors; z variability 

69682 

69990 

0.996 

12359 

13221 

0.935 

coadded colors; g Sz r variability 

69663 

70081 

0.994 

12816 

13221 

0.969 

coadded colors; r Sz i variability 

69658 

70096 

0.994 

12800 

13221 

0.968 

coadded colors; g, r, Sz i variability 

69948 

70108 

0.998 

12626 

13221 

0.955 


Note. — Fraction of non-quasars correctly classified as non-quasars and quasars 
correctly classified as quasars from the leave-one-out cross-validation of the training sets. 
The non-quasar total is different in the different rows because the non-quasar training 
set is “cleaned” before it is used for the final classification, as described in Section [3] 
The bandwidths are chosen to optimize the product of completeness and efficiency. 
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Figure 6. Fraction of quasars correctly classified as quasars using coadded colors and variability, as a function of redshift. Notice the 
improved completeness near redshifts 2.7 and 3.5, where the quasars and non-quasars overlap in color space, with the addition of variability 
parameters. Shown are single bands of variability combined with coadded colors (left panel ) and combinations of multiple bands of variability 
combined with coadded colors (right panel). In both panels the gray line shows the number of quasars in each bin (right axis). 


Table 3 

NBC KDE Results: Self Test Completeness and Efficiency 


Self Test 

Variability Only 

Single Epoch Colors w/ Variability 

Coadded Colors w/ Variability 

Completeness 

Efficiency 

Completeness 

Efficiency 

Completeness 

Efficiency 

color only 



0.6226 

0.8960 

0.9343 

0.9791 

u variability 

0.4198 

0.8517 

0.6934 

0.9289 

0.9312 

0.9660 

g variability 

0.8424 

0.9529 

0.8372 

0.9149 

0.9635 

0.9776 

r variability 

0.8424 

0.9588 

0.8583 

0.9165 

0.9637 

0.9757 

i variability 

0.8155 

0.9589 

0.8126 

0.9235 

0.9556 

0.9785 

z variability 

0.4087 

0.8843 

0.7158 

0.9214 

0.9348 

0.9757 

g & r variability 

0.9122 

0.9775 

0.8115 

0.9758 

0.9694 

0.9684 

r & i variability 

0.9026 

0.9726 

0.8076 

0.9734 

0.9682 

0.9669 

g, r, $z i variability 

0.9190 

0.9754 

0.8573 

0.9761 

0.9550 

0.9875 


Note. — Completeness (known quasars classified as quasars divided by known quasars) and efficiency 
(known q uasa rs classified as quasars divided all objects classified as quasars) for each of the self tests described 
in Section [4.2| This indicates that the most successful option is a combination of coadded colors and variability, 
but no particular variability bands stood out when in combination with colors. 
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In the SDSS Stripe 82 region, where we will conduct 
our experiments on variability selection of quasars, we 
are able to combine multiple epochs of imaging data 
to produce more accurate col or m easurements of the 
quasars (as discussed in Section|2.1|). Thus, we perform a 
second self test using coadded colors for each object. Ta¬ 
ble [ 2 j row 2 demonstrates that the use of coadded colors 
yields a small improvement in the efficiency of the sam¬ 
ple, but a large improvement in the completeness—now 
being 93% complete. Figure [5] shows that most of this 
improvement comes from the recovery of high-redshift 
quasars; smaller photometric errors make it easier to dis¬ 
tinguish the high-redshift quasar distribution from stars. 
However, there is still a dip at z ~ 2.8 where even the 
coadded colors do not enable better than 75% complete¬ 
ness. 


4.2. Choosing Optimal Classification Parameters 


Variability alone can be the basis for a robust quasar 


classification (e.g., 

Schmidt et al. 2010 

Butler & Bloom 

2011 

MacLeod et al. 2011), so we next perform a self 

test 

3 y applying KDF to the pair ol variability param- 


eters for each band (as defined in Section [iO| and then 
on combinations of variability parameters from the mul¬ 
tiple bands. The results are shown in Table [2] and Fig¬ 
ure [5j It is interesting to compare the performance of the 
bands because each represents different distances from 
the center of the accretion disk, different characteristic 
timescales, and different (redshift-dependent) peak am¬ 
plitudes. 

Particularly important is that variability selection has 
a higher completeness in the range 2.6 < z < 3.0 than do 
colors. There are no significant trends with redshift in 
the A-p space in the g , r, and i bands, so the quasars can 
be separated out from the non-quasars in the variability 
space without completeness issues at specific redshifts 
(unlike the dramatic drops seen for color-only selection). 
The completeness drops off gradually with higher red- 
shift, which is a result of changes in observed magnitude, 
signal-to-noise ratio, and time scale of variability in the 
observer’s frame. Combining g and r, r and i, and g, r, 
and i, we find similar trends as using just the variability 
parameters from a single band, with marginally higher 
completeness (and efficiency) at all redshifts. 

Selection by u- and 2 -band variability performs much 
worse than both coadded and single epoch colors. The 
u band is strongly influenced by Lya forest absorption 
of the (variable) quasar continuum at high redshift, thus 
suppressing the signal-to-noise ratio. This results in dis¬ 
cordant variability parameters for quasars that are quite 
apparent in Figure [4] The lower performance of the z- 
band is likely due to the lower signal-to-noise ratio of 
the photometry and thus the larger scatter of the vari¬ 
ability parameters as seen in Figure |4j These discrepant 
values increase the probability of high-redshift quasars 
being classified as stars. 

While variability selection produces more consistent re¬ 
sults with redshift than color selection, we find that, at 
many redshifts, color selection is still superior. We thus 
consider coadded colors with combinations of variability 
parameters from single and multiple bands. The results 
are shown in Table [2] and Figure [ 6 ] Adding variability 
parameters from just one band significantly improves the 
selection, especially the high signal-to-noise ratio bands 


g 1 r, and i. The addition of the u- and 2 -band variability 
to colors still fails at z~ 2.8 because the variability signal 
is not strong enough (as demonstrated in Figures [4] and 
[5]) to overcome color selection bias. 

We graphically summarize the results of the self tests in 
Figure[7] Quasar completeness as a function of redshift is 
shown in the left panel, quasar completeness as a function 
of i magnitude in the center panel, and quasar efficiency 
as a function of i magnitude in the right panel. For colors 
alone, both coadded and single epoch, there are regions 
of color space where the quasar training set and non¬ 
quasar training set overlap, resulting in redshift regions 
with poor completeness. Variability alone, as demon¬ 
strated by the r-band selection, does not have these red¬ 
shift trends, but has a lower efficiency than coadded col¬ 
ors at all other redshifts. The addition of coadded colors 
to the r-band variability information helps to improve 
upon the colors alone at all redshifts, but in particular 
in the dips at 2 ~ 2.7 and 2 ~ 3.5. Using coadded col¬ 
ors together with variability in multiple bands improves 
the classification even further (e.g., compare the solid 
green lines to the dotted green lines). The left panel of 
Figure [7] shows that adding the *-band variability makes 
things worse (possibly because the i-band has a lower 
signal-to-noise ratio than g or r given that quasars gen¬ 
erally have blue spectral energy distributions), but note 
that there are relatively few high-redshift objects and the 
middle panel shows that the loss of completeness is com¬ 
ing from very faint objects. Moreover, the right panel 
shows that adding the i-band variability improves the 
efficiency. Table [3] shows that while adding the i-band 
variability reduces the completeness by 1 %, it compen¬ 
sates by increasing the efficiency by 2 %. 

These self tests of the quasar and non-quasar train¬ 
ing sets validate our hypothesis that the most successful 
option is a combination of coadded colors and variabil¬ 
ity. No combination of colors and variability was highest 
in both completeness and efficiency; however, the com¬ 
bination of coadded colors and both g and r variability 
parameters give the most robust selection with a com¬ 
bined product of completeness and efficiency of 93.88% 
(see Table [3]) and was consistent in completeness across 
all redshift values (see Figure [ 6 ]). As such, for our anal¬ 
ysis of the test set in the next section, we have adopted 
coadded colors with both g and r variability parameters 
as our basis set. 

5. BUILDING A QUASAR CANDIDATE CATALOG 

Now that the mo st efficient set of parameters are cho¬ 
sen, in Section |5.1| the algorithm is applied to the test 
set usin g the full quasar training set. Finally, in Sec¬ 
tion [572] we test a process where the algorithm is used to 
perform simultaneous classification and redshift estima¬ 
tion. Specifically, the test set is classified using a series 
of quasar training sets that only contains quasars from 
limited redshift ranges. 

5.1. Classifying the Test Set 

In the previous section we identified coadded colors 
combined with both g and r variability as producing the 
best classification for the training set objects. We now 
apply the selection to the test set. The NBC KDE algo¬ 
rithm was used to perform an 8 -D classification (u — g, 
g — r, r — i, i — 2 , A g , r ) gi A r , and 7 r ), using the same 
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Redshift Coadded i Magnitude Coadded i Magnitude 


Figure 7. Comparison of self tests using with different combinations of color and variability. These panels demonstrate that the combina¬ 
tion of color and variability gives the best results for completeness and efficiency as a function of redshift and magnitude with more details 
in the text. Shown are the completeness (known quasars classified as quasars divided by known quasars) as a function of redshift (left 
panel), completeness as a function of coadded z-band magnitude ( center panel), and efficiency (known quasars classified as quasars divided 
all objects classified as quasars) as a function of coadded z-band magnitude ( right panel). The gray line shows the number of quasars in 
each bin (right axis). 


bandwidths used during the self tests and an identical 
prior. The objects identified as quasar candidates, with 
P(Q\d) > 0.5, are listed in the catalog (available online) 
which is described in more detail in Section [7] 

The results of the classification are shown in Figure [8j 
We will discuss the new candidate quasars, their char¬ 
acteristics, and contaminants in Sections □ and [8] In 
general, the candidate quasars (green contours) closely 
mirror the distribution of the known quasars (orange 
contours) and extend slightly beyond in the parameter 
space. The incorrectly classified quasars lie in the area 
where quasars and non-quasars overlap in color and vari¬ 
ability space. When comparing to the quasar distribu¬ 
tion as a function of redshift shown in Figure [l] the can¬ 
didate quasars extend beyond the known quasars into 
mid-redshift and high-redshift regions of color space. The 
candidate quasars have a higher density in the areas over¬ 
lapping the non-quasars (gray contours), than the known 
quasars. This could be caused by the variability parame¬ 
ters selecting quasars that were missed by color selection 
because they are hidden in the stellar locus, or stellar 
contaminants in our selection. There are also some new 
candidates in the bluest corner of g — r vs. r — % color 
space which are likely white dwarf contaminants that we 
will attempt to purge in Section [7] 


5.2. Classification using Redshift Bins 

Quasar colors depend on redshift as shown in Figure [l] 
As such, it is possible to identify qua sars w hi le sim ul tane¬ 
ously estimating their redshifts (e.g., Suchkov et al.|2005| 


Bovy et al. 2012). We test the extension ot our method in 
a similar manner simply by limiting the quasar training 
set to a narrow redshift region. By doing so, we are able 
to select quasars with colors similar to other quasars of 
that redshift, thereby simultaneously providing a rough 
estimate of the redshift. 

To ac com plish this, the full quasar training set (see 
Section 2.5) was divided into 18 separate training sets 
by redshift: non-overlapping redshift bins from 0.4 to 4.0 
with a bin width of 0.2. The quasars outside each redshift 
bin were added to the non-quasar training set. A handful 
of quasars that were significant outliers (5er) from the 
modal color in each bin were removed from the quasar 
training set. These outliers could be caused by errors in 


the photometry and/or heavy dust reddening. Including 
them caused us to find objects with those colors that are 
not really quasars or are quasars at a different redshift. 

As above, a self test was performed on the training 
sets for each redshift bin to find the optimal bandwidths. 
Specifically, the redshift-bin training sets were used to 
classify the full quasar training set (13,221 quasars span¬ 
ning the full redshift range). The results of these self 
tests are shown in Table [4] and Figures [9] and [To] These 
show that the completeness of quasar classification (both 
identifying known quasars as quasars and also as being 
in the correct redshift bin) is generally better than 75%. 
The contamination (here quasars from the wrong redshift 
bin being selected) is typically less than 10%. 

Of the 13,221 training set quasars, 12,535 were clas¬ 
sified in at least one bin (94.8% overall completeness). 
These objects are shown as a density plot in Figure [T0| in 
Az = 0.2 photometric redshift bins. The regions ofmis- 
classification at spectroscopic redshifts ~ 0.75 and ~ 2.1 
stem from degeneracies in color-redshift space. 

With the self test com plet ed, we finally classify the test 
set describ ed i n Section [231 the same that was classified 
in Section |5.1| For each of the non-overlapping redshift 
bins from 0.4 to 4.0, each object in the test set is returned 
as either a quasar candidate or a non-quasar candidate. 
If it is found to be a quasar candidate, we calculate the 
quasar probability (in addition to the initial binary clas¬ 
sification). Many objects were found to be quasar can¬ 
didates in several bins and the classification probability 
in each bin was calculated. Results of the classification 
are given in Table [5j Figure [TT] shows the results of the 
classification in color and variability parameter space, as 
in Figure |8j We discuss t he di fference in this selection 
and the selection in Section [5TT| in Section [7] An analysis 
of the quasar candidates is performed in Section [8] 

6 . REDSHIFT ESTIMATION 


In this section we will improve on the accu rate, but not 

and compute 
idates. First, 


precise, redshift estimation of Section |5.2 
photometric redshifts for the quasar canc 
we will describe the astrometric in form ation (Section 6.1) 
and near-infrared colors (Section 
in addition to optical colors (Sect 


6 .2|), that will be used 


ion 


2.3). We combine 


these inputs to calculate photometric redshifts using the 


Number of Quasars 
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Figure 8. Color and variability parameter s pace plots showing the results of test set classification using a single quasar training set 
covering the full quasar redshift range (Section |5.1|) . These panels demonstrate that the incorrectly classified quasars lie in the area where 
quasars and non-quasars overlap in color and variability space and that the candidate quasars closely mirror the distribution of the known 
quasars and extend slightly beyond in the parameter space (including a region known to be inhabited by white dwarfs in the blue corner 
of the upper right panel). Colors left panel : u — g color vs. g — r, colors right panel : g — r vs. r — i, variability left panel : A g vs. 7 g , and 
variability right panel: A r vs. 7 r . Objects in the test set classified as non-quasars are shown as gray contour quasar candidates that are 
not spectroscopically identified are shown as green contours and scatter points for outliers, spectroscopically identified quasars classified 
as quasars are shown as orange contours and scatter points for outliers, and spectroscopically identified quasars incorrect ly c lassified as 
non-quasars are shown as purple dots. The red dashed line in the upper right panel is the white dwarf cut described in Eq. |l 2 | 
a Levels for contours in Figures [ 8 ] and ITT] gray: colors - 95%, 90%, 80%, 60%, 40%, 20 %, variability - 98%, 95%, 90%, 80%; green: colors 
- 90%, 80%, 60%, 40%, 20%, variability - 90%, 80%, 60%; orange: 90%, 80%, 60%, 40%, 20%. 


method described in Weinstein et al. 


the robu stness of our different redshi 
tion 16.31 


2004 We compare 


t estimates in Sec- 


6 .1. Astrometry 
In addition to colors, our analysis will m ake use o f as¬ 


trome tric measurements of quasars (Kacznrarczik et al. 


2009). Light rays from extraterrestrial sources are bent 


according to Snell’s law as they enter the Earth’s at¬ 
mosphere from the vacuum of space. A celestial source 
observed from the Earth will appear higher in the sky 
than it actually is, unless it is at the zenith. The amount 


of this deflection depends on the index of refraction in 
the air and the photon’s angle of incidence. Since the 
index of refraction of air is a function of wavelength, 
shorter wavelength photons are bent more than longer 
wavelength photons. This effect is known as differential 
chromatic refraction (DCR). 

The automated corrections for the DCR effect to the 
SDSS astrometry are computed as a function of a broad¬ 
band flux ratio. The DCR for any given object depends 
on the effective wavelength of the bandpass (the convo¬ 
lution of the object’s SED and the filter transmission 
curve) of the object within a given bandpass, which in 
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Table 4 

NBC KDE Results: Test Set Classification of Spectroscopically 
Confirmed Quasars 


redshift bin 

number inside redshift bin 

number outside redshift bin 

correct 

total 

traction 

correct 

total 

traction 

0.4 < 2 

< 

0.6 

67 

84 

0.798 

12788 

13137 

0.973 

0.6 < 2 

< 

0.8 

368 

494 

0.745 

11855 

12727 

0.932 

0.8 < 2 

< 

1.0 

662 

870 

0.761 

11704 

12351 

0.948 

1.0 < 2 

< 

1.2 

891 

1043 

0.854 

11368 

12178 

0.934 

1.2 < 2 

< 

1.4 

949 

1097 

0.865 

11307 

12124 

0.933 

1.4 < 2 

< 

1.6 

1100 

1262 

0.872 

11147 

11959 

0.932 

1.6 < 2 

< 

1.8 

1085 

1191 

0.911 

10766 

12030 

0.895 

1.8 < 2 

< 

2.0 

851 

1078 

0.790 

11343 

12143 

0.934 

2.0 < 2 

< 

2.2 

1036 

1278 

0.811 

11150 

11943 

0.934 

2.2 < 2 

< 

2.4 

1151 

1322 

0.871 

10349 

11899 

0.870 

2.4 < 2 

< 

2.6 

996 

1084 

0.919 

10572 

12137 

0.871 

2.6 < 2 

< 

2.8 

535 

782 

0.684 

11866 

12439 

0.954 

2.8 < 2 

< 

3.0 

469 

540 

0.869 

12093 

12681 

0.954 

3.0 < 2 

< 

3.2 

340 

435 

0.782 

12377 

12786 

0.968 

3.2 < 2 

< 

3.4 

223 

298 

0.748 

12587 

12923 

0.974 

3.4 < 2 

< 

3.6 

103 

119 

0.866 

12933 

13102 

0.987 

3.6 < 2 

< 

3.8 

107 

111 

0.964 

12966 

13110 

0.989 

3.8 < 2 

< 

4.0 

61 

65 

0.939 

13026 

13156 

0.990 


Note. — Fraction of quasars inside the redshift bin correctly 
classified as inside the redshift bin and quasars outside the redshift 
bin correctly classified as outside the redshift bin from the leave- 
one-out cross-validation of the training sets, using the training sets 
divided into redshift bins. 


Table 5 

NBC KDE Results: Test Set Classification with Redshift Bins 


redshift bin QSO candidates known QSOs returned 



all 

qso prob >0.8 

known QbOs 

returned 

traction 

qso prob >0.8 

traction 

0.4 < z < 0.6 

2925 

380 

84 

67 

0.798 

46 

0.548 

0.6 < z < 0.8 

3433 

801 

494 

367 

0.743 

293 

0.593 

0.8 < ^ < 1.0 

3590 

767 

870 

671 

0.771 

332 

0.382 

1.0 < z < 1.2 

4775 

1920 

1043 

883 

0.847 

567 

0.544 

1.2 < z < 1.4 

6238 

2981 

1097 

945 

0.861 

656 

0.598 

1.4 < z < 1.6 

5543 

2237 

1262 

1097 

0.869 

754 

0.598 

1.6 < 2 < 1.8 

7838 

3516 

1191 

1083 

0.909 

740 

0.621 

1.8 < z < 2.0 

5931 

2585 

1078 

840 

0.779 

574 

0.533 

2.0 < z < 2.2 

5195 

1948 

1278 

1034 

0.809 

582 

0.455 

2.2 < 2 < 2.4 

4162 

2354 

1322 

1146 

0.867 

895 

0.677 

2.4 < 2 < 2.6 

4540 

2477 

1084 

993 

0.916 

832 

0.768 

2.6 < 2 < 2.8 

3023 

1028 

782 

524 

0.670 

327 

0.418 

2.8 < 2 < 3.0 

2246 

1295 

540 

465 

0.861 

410 

0.759 

3.0 < 2 < 3.2 

1390 

753 

435 

334 

0.768 

260 

0.598 

3.2 < 2 < 3.4 

1228 

644 

298 

223 

0.748 

181 

0.607 

3.4 < 2 < 3.6 

1122 

671 

119 

102 

0.857 

99 

0.832 

3.6 < 2 < 3.8 

596 

399 

111 

106 

0.955 

106 

0.955 

3.8 < 2 < 4.0 

514 

348 

65 

60 

0.923 

58 

0.892 

Total 

32108 

20962 

13153 

10940 

0.831 

7712 

0.586 


Note. — Classification of the full test set of objects, using the training sets divided into redshift 
bins. Total will not be a sum of the above rows because many objects were classified in multiple 
bins. 


turn depends upon the filter’s transmission properties 
and on the distribution of the source’s flux within the 
bandpass. A pure power-law (without emission lines) 
changes the effective wavelength in a correctable way, 
but the DCR corrections become anomalous when there 
are emission lines. For example, adding an emission line 
on the blue side of the filter makes the effective wave¬ 
length bluer, while adding an emission line on the red 
side makes the effective wavelength redder. For emission 
line objects (like quasars), the effective wavelength can 
be very different from the assumed power law, chang- 
ing by as much as 150A in the it-band (Kaczmarczik 


et al. 2009). The difference between the expected and 
observed astrometric displacements due to DCR enables 
the distinction of quasars and non-quasars in addition to 
providing an additional source of information about the 
redshift of the object. We examine the d iffe rential D CR 
offset (along the parallactic angle; Filippenko 19821 in 
the zi-band ( auPar ) and in the 5 -band ( agFar ); the ef¬ 
fect is too small to measure in r, i, and 2 given the astro- 
metric errors of our data and the smaller DCR at longer 
w avel engths. 


Kaczmarczik et al. (2009) reduced the statistical er¬ 
ror in the astrometric offsets of individual objects by 
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Figure 9. Classification of a test set of quasars with known spec¬ 
troscopic redshifts, using the training sets divided into redshift 
bins. Dark blue indicates all quasars in that bin, light blue indi¬ 
cates quasars classified with the correct redshift. The ratio of the 
two is the completeness of quasars inside the redshift bin. 


normalizing the DCR offsets at multiple epochs (each 
with different airmass) to some fiducial airmass. Here 
we take a different approach that we find to be more 
robust. To first order, differential refraction is linear in 
tan(Z), where Z is the zenith angle, with zero intercept 
(no DCR at airmass of one at the zenith). Thus, a plot 
of multiple epochs of noisy quasar DCR measurements 
should cluster around a line with a fixed slope (for a given 
bandpass and object redshift) with zero intercept. 

In a manner similar to our structure function fitting 
above, we use minimization of a log likelihood function to 
calculate the astrometric parameters in the u and g band. 
We fit the data with a straight line that runs through the 
origin and parameterize the DCR simply by the slope of 
the line. The light curve is cleaned of outliers in the same 
way as was done for the variability parameter calculation. 
We require at least 10 good observations in each band 
and at least one observation with airmass in the r band 
greater than 1.5, which is tan (Z) ~ 1.1—contrary to 
the variability analysis above since here higher airmass 
means a larger DCR signal despite greater extinction. 
We weight each observation by the r-band airmass since 
higher airmass observations are more rare and should 
have greater discriminatory power. Further work could 
be performed in the future to determine if this weighting 
scheme i s in deed optimal. 

Figure [12] shows an example of this process for a sin¬ 
gle quasar with the rt-band data in blue and the g-band 
data in green. These astrometric data can be used to con¬ 
strain photometric redshifts for quasars in surveys where 
there are many observations and/or observations at high 
airmass that c an provide constraints on the DCR slope. 
See Figure 7 of Kaczmarczik et al. [2009 We will use the 
astrometric parameters auPar ana agPar in Section [6.3| 
when calculating the photometric redshifts of the quasar 
candidates. 

In Figure [l3j left panel , we plot all of the empirical 
DCR slopes for the quasar training set. The right panel 


of Figure [U5] shows that non-quasars and quasars have 
somewhat different signals in this parameter space. We 
have only included point sources in this analysis, but the 
process should work for normal star forming galaxies too, 
as the 4000A break can produce significant astrometric 
shifts relative to the SED model assumed in the astro- 
metric solution. In this pilot investigation, we have not 
used the DCR effect for classification; however, the infor¬ 
mation provided by DCR would add yet another piece of 
information that could be used to refine the classification 
probabilities of the objects in the test set. For example, 
objects with large negative values of auPar are (empiri¬ 
cally) more likely to be non-quasars than quasars. 


6 .2. VISTA Hemisphere Survey 

While we select objects only using optical imaging 
data, we can make use of near-IR (NIR) photometry to 
improve our photometric redshift estimates. The VISTA 
Hemisphere Survey (VHS) is a near-infrared survey with 
coverage in the southern hemisphere, including the full 
Stripe 82 footprint. The second VHS public data re¬ 
lease (VHSDR2) was made available on the VISTA Sci¬ 
ence Archive (VSATnin April 2014. These data include 
three bands J, H, and Ks, with (Vega) magnitude lim¬ 
its of J = 20.2, H = 19.3, and Ks = 18.2 (IMcMahon 


et al. 2013 ). Usin g the Rayleigh criter ia, the surveys were 
matched at 1.0" ( Parejko et ai.|2008 ): 48% of the quasar 
candidates had matches in all three bands. It would be 
beneficial to calculate photo-z estimates for the remain¬ 
ing non-detections to put constraints on the quasar SED, 
but that is beyond the scope of this work. 


6.3. Photometric / Astrometric Redshifts 
Empirical photometric redshifts (Richards et al.|[2001|) 


were calculated for all of the obj ects tha t we reTound to 
be potential quasars in Section s |5.1| o r 5.2[ Th e algo- 
rithm is described in detail in Weinstein et al. (2004) 
and essentially involves least-squares fitting (without er- 
ror weighting) between the candidate quasar colors and 
the mean (sigma clipped) colors of quasars as a function 
of redshift. The covariance matrix used in the process 
was calculated using the quasars with known spectro¬ 
scopically determined redshifts. The quasars are binned 
by redshift in bins of width 0.02. The mean color-vector 
and the color covariance matrix is found for the quasars 
in each redshift bin; see Figure 4 of Richards et al. (2015, 
submitted). For each of the quasar candidates, we calcu¬ 
late how “far” its colors are from these calculated mean 
colors and convert this information into a probability 
distribution a s a function of redshift bin, as shown in 
Equation 5 of Weinstein et al. (2004). The peak of the 
probability distribution is reported as the photometric 
redshift and the confidence is calculated by integrating 
under the curve down to a threshold. A few exa mp les of 
photometric redshift PDFs are shown in Figure [Mj 
First, the photometric redshift was calculator using 
SDSS adjacent colors (u — g, g — r, r — i — z). The 
mean colors were calculated using all MQC objects with 
known spectroscopic redshifts (i.e., not just the Stripe 82 
quasars) using coadded photometry when available. We 
did this to improve the constraints on the photometry 


http: / / horus. roe. ac. uk/vsa/index, ht ml 
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Figure 10. Comparison of spectroscopic redshift to the bin into which known quasars were classified with the highest probability. Left 
panel: Spectroscopic redshift vs. the most probable redshift bin. Right panel: Histogram of A z (the most probable redshift bin minus the 
spectroscopic redshift). Only 5.6% of the quasars have \Az\ > 0.5 


for high-redshift quasars. Those objects without coad¬ 
ded photometry have larger photometric errors, but the 
increase in the number of objects overcomes the noise. 
The color-based photo-z PDF of 4 representative objects 
is shown in green in Figure [l4j The 13,419 quasars on 
Stripe 82 with spectroscopic redshifts are shown in Fig¬ 


ure 15 ( top left panel). Of these objects, 5,843 (43.5%) 
have a calculated photometric redshift within 0.1 of the 
spectroscopic redshift and 10,201 (76.0%) are within 0.3, 
as seen in Figure |16| The quasars around redshift 0.8 
and 2.2 have particularly poor photometric redshifts be¬ 
cause of a color-redshift degeneracy. This is des cribed in 
detail in Section 4.2.3 of Weinstein et al. (2004). 

Next, a redshift based on the astrometric data (the as¬ 
trometric redshift) was c alculated using the parameters 
described in Section 16.11 The mean vector and the co- 
variance matrix were calculated using auPar and agPar, 
using the same method as for the SDSS adjacent colors. 
The astrometric redshift PDF is shown in orange in Fig¬ 
ure [14} The 13,028 quasars on Stripe 82 with spectro¬ 
scopic redshifts and for which we were abl e to calculate 
astrometric redshifts are shown in Figure 15 ( top right 
panel). This process gives poorer redshift estimates than 
the SDSS photometric redshifts, but the purpose is to 
break degeneracies in the photometric redshifts by com¬ 
bining photometric and astrometric information. That 
is, the astrometric redshift serves as an informative prior. 

Next, the astrometric redshift PDFs and the photo¬ 
metric redshift PDFs are combi ned us ing wei ghted aver¬ 
ages i n a similar manner as |Carrasco Kind fe Brunner| 
(2014) (Section 3.1.2 and Equation 7) to make astro- 
pTioiometric redshifts. Specifically, we have combined 
the PDFs by adding rather than multiplying in order 
to enable a relative weighting of the two PDFs. In fu¬ 
ture work, we will consider a multiplicative joining of 
the data with smoothing to provide relative weighting. 
The colors curve is given five times the weight of the as¬ 
trometry curve chosen based on empirical experiments 
with di ffer ent weights. The resulting curve is shown in 
Figure 14 in purple. When the photometric redshifts 


returned by the colors alone are inconsistent with the 
spectroscopic redshifts, the correct redshift is generally 
one of the secondary peaks in the color-based PDF. The 
astrometric-redshift PDF generally has a plateau at one 
end of the redshift range or several large peaks. When 
the two PDFs are combined, it pulls out the correct 
peak in the color-based PDF as the best estimate of the 
redshift. The 13,028 training set quasars in Stripe 82 
with spectrosco pic redshifts and astrometric values are 
shown in Figure [15} (bottom left panel). Of these objects, 
6,717 (51.6%) have a calculated astro-photometric red¬ 
shift within 0.1 and 10,010 (76.8%) are within 0.3, as 
seen in Figure [l6| 

Finally, for the 17,321 quasar candidates with mat ches 
to the VHS catalog (about 48%) (see Section 6.2[) the 
photometric redshift was calculated using the SDSS and 
VHS adjacent colors ( u — g , g — r, r — i , i — z, z — J , J — H , 
H — K). The 9,244 quasars on Stripe 82 with spectro¬ 
scopic redshifts and matches to VHS data are shown in 
Figure [l5| ( bottom right panel). Of these objects, 4,951 
(53.6%)nave a calculated photometric redshift within 
0.1 of the spectroscopic redshift and 7,250 (78.4%) are 
within 0. 3, a s seen in Figure [16} 

Figure [16] demonstrates that adding either NIR col¬ 
ors or astrometric information significantly improves the 
redshift estimates over using only optical colors. Com¬ 
parison of the continuously-determined re dshif ts versus 
the discrete redshift binning from Section 5.2 suggests 


that the binning method is somewhat more accurate (in 
terms of having fewer outliers), but not as precise as the 
astro-photometric redshifts or optical+NIR photometric 
redshift. 

We graphically summarize the quality of the photo¬ 
metric redshifts in Figure [IT] by showing the distribu¬ 
tion of true redshifts within a given photometric redshift 
bin. The pho tometric redshift bins w ere chosen to match 
those of the Richards et al. (2006) quasar luminosity 
function. It will be necessary to correct for such pho¬ 
tometric redshift errors befor e de termining the quasar 
luminosity function in Section [873} We find that objects 
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7. CATALOG 


Figure IF _ As Figure [8] color and variability space plots showing the results of test set classification, but using redshift bins (described 

in Section |5.2fr . In the bottom panels, we find that the selection in variability parameter space shows no noticeable difference to Figure|s] 
which is not surprising as A g vs. 7 g and A r vs. 7 r have no strong redshift trends. However, there are slight differences in color space (top 
panels). We discuss these further in Section [7] 

with photometric redshifts of z ~ 1.25 and z ~ 3.25 are 
particularly robust, whereas the z ~ 0.85 objects are of¬ 
ten mistaken for z ~ 2.2. This is caused by de generacies 
in color-redsh ift space. As shown in Figure 1 of |Richards| 
et al. (2001), the colors of particular quasars can fall 
within the lcr distribution of the color-redshift relation 
at many redshifts. Using all four SDSS colors decreases 
the areas of degeneracy and adding IR colors or astrome¬ 
try decreases them still further. The degeneracies found 
in this work are similar to those described in Section 3.4 


Fr om the classification test set, described in Sec¬ 
tion ^ 5j we present a FITS catalog of t he 3 6,5 69 o bjects 
classified as quasars in either Section |5.1| or |5.2| The 


number of objects and their origin (5.1 or 5.2) is sum- 


of Richards et al. 


( 2001 ). 


Overall, we iind that optical+NIR magnitudes can im¬ 
prove the photometric redshift accuracy; however, with 
astro-photometric redshifts we can surpass the improve¬ 
ments due to NIR data alone. 


marized in Table [6] and a description of the columns in 
the binary FITS catalog table are provided for reference 
in Appendix [Aj The catalog is available online. 

Another Bayesian selection method using optical and 
mid-infrared (MIR) colors (Richards et al. 2015, sub¬ 
mitted) was able to clean out contaminating bright stars 
using some simple color cuts. We similarly use MIR color 
cuts to clean bright stars out of our final candidate list. 
To do so, we matched the quasar candidate catalog to 
the WISE ALLWISE data release^] Of our candidates, 
19,720 (53.9%) had matches in both W1 and W2 (AB 


15 wise2.ipac.caltech.edu/docs/release/allwise/ 
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Figure 12. The measured astrometric offset along the parallactic 
angle as a function of tan(Z). Shown is SDSS J013417.81-005036.2, 
a redshift 2.26 quasar from SDSS Stripe 82, the same object shown 
in Figure [2] This quasar is shown as an example representative of 
the data set. Each point refers to a different observation of this 
object, at a different airmass. The astrometric accurac y is ~ 0.03 
arcse cs for g < 20.0, but up to 0.1 arcsecs for g ~ 22.0 (Pier et al.| 
12003}. u-band observations are shown in blue, with those points 
that were outliers removed from the light curve in Figure [2] are 
shown in red. g-band observations are shown in green, with outliers 
removed from the light curve shown in orange. The fits, shown as 
solid blue and green lines, have an y-axis intercept of zero. For this 
quasar, the slope of the line (offset along the parallactic angle) in 
the u-band ( auPar ) is -0.055 and g-band ( agPar ) is 0.105. The 
astrometric redshift is found to be 2.57. 


magnitudes). For these objects, we made the following 
cuts: 

i < 19.5 (10) 

i < (—5.5(W1 — W2) + 19.5) (11) 

following Richards et al. (2015, submitted) and using the 
coadded i magnitude. This process identified 573 candi¬ 
dates that are flagged as likely stellar contaminants in 
the catalog as noted in Table [7j The majority of these 
objects have colors that are consistent with the stellar 
locus and have a mean i magnitude of 16.8. 

Most white dwarf contaminants are below WISE detec¬ 
tion thresholds. Thus, to eliminate these contaminants 
we made the following optic al co l or cut, guided by the 
SDSS white dwarf catalog of|Pietro Gentile Fusillo et al.| 
|M5l: 

(r-i) < (-0.62( 5 -r) -0.37). (12) 


We used the coadded magnitudes and confirmed that 
this cut would remove none of the spectroscopically con¬ 
firmed quasars from our tra ining set. It removes 48% of 
the known white dwarfs in IPietro Gentile Fusillo et al.l 
(2015) and identified 178 quasar candidates as possible 
white dwarfs. These candidates are flagged as likely 
white dwarf contaminants in the catalog as noted in Ta¬ 
ble [TJ These possible white dwarfs are all in the bluest 
corner of g — 7 ' vs. r — * color space and have a mean i 
magnitude of 21.7. 

All together, after the ALLWISE and white dwarf cuts, 
there are a total of 35,820 “good” quasar candidates 
in Stripe 82. (Perform the following query to retrieve 


these objects from the catalog: WISEcut_label == 0 & 
WDcutdLabel == 0 & candidateALabel == 1.) These 
candidates are used in the analysis that follows. 

Classification ov er th e whole redshift range (as de¬ 
scribed in Section 15.lh returned 33,240 quasar candi¬ 
dates, or 3.63%, of the 916,587 objects in the test set— 
roughly consistent with the prior of 5%. Of the 13,221 
spectroscopically confirmed quasars that could have been 
returned, we found 12,898 (97.6% com pleteness). Clas¬ 
sification in redshift bins (Section 5.2) returned 31,600 
objects as potential quasars. Of the 13,221 spectroscop¬ 
ically confirmed quasars that could have been returned, 
we found 12,511 (94.6% completeness). Thus, our at¬ 
tempts at simultaneous classification and redshift estima¬ 
tion are somewhat less complete than our efforts to clas¬ 
sify quasars regardless of redshift. Using either method, 
of the 13,221 spectroscopically confirmed quasars that 
could have been returned, we found 12,953 (98.0% com¬ 
pleteness) . 

Of the candidates, 29,020 (81.0%) were identified by 
both methods. As shown in Figures [8| and 11 the quasars 
selected using these two methods show similar distribu¬ 
tions. In the bottom panels, we find that the selection 
in variability parameter space shows no noticeable differ¬ 
ence, which is not surprising as A g vs. j g and A r vs. 7 r 
have no strong redshift trends. However, there are slight 
differences in color space ( top panels). Using the quasar 
training set in redshift bins we select more g — r > 1.0, 
u — g < 2.0, and z p hot > 3.4 quasar candidates, many of 
them potential contaminant stars. Using the full redshift 
range we select more objects in the bluest corner of g — r 
vs. r — * space, many of them flagged as potential white 
dwarf contaminants. 

As described in Section 2.2 the SDSS I/II quasars were 
primarily color-selected to 1 < 1 9.1 f or lo w-redshif t and 
to i < 20.2 for high-redshift (Richards et al. 20021, but 
the target selection on Stripe 82 was deeper, initially 
going to i = 19.9 for low-redshift and i = 20.4 for high- 
redshift; later to * = 20.2 for low-redshift sources and i = 
20.65 for radi o sources; and l ater to i < 21.0 for variable 
sources (Adelman-McCarthy et al. 2006}. As such, when 
we consider the completeness of previous spectroscopic 
observations on Stripe 82, it is important to consider the 


magnitude of the ob 
are shown in Figure 
the new quasar canc 


ects. The “good” quasar candidates 
18| Note the change in character of 
iclates at i ~ 20 . 0 . 


According to Vanden Berk et al. (2005), the complete¬ 
ness of the SDSS quasar selection algorithm for sources 
with i < 19.1 is C q = 94 . 9 ^ 3 ( 8 % at the 90% confi¬ 
dence level. We will consider the completeness of exist¬ 
ing quasar spectroscopy on Stripe 82 both brighter and 
fainter than this limit. Our region of selection extends 
beyond the region of uniform spectroscopic follow-up by 
SDSS: —10° < RA < 50°, therefore in order to do this 
comparison, we must limit our examination to this re¬ 
gion. This includes 12,107 of the 22,867 “good” quasar 
candidates in the catalog that are not spectroscopically 
confirmed. There are 1,090 (3,183) spectroscopically con¬ 
firmed quasars brighter than a coadded i-band magni¬ 
tude of 19.1 (19.9) and we find 61 (192) additional quasar 
candidates. Assuming that all of our new “good” candi¬ 
dates are real, this completeness of 94.7% (94.3%) agrees 
well with Vanden Berk et al. (2005). However, we might 
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auPar auPar 


Figure 13. Slope of the line (offset along the parallactic angle) with respect to redshift in the u-band (auPar) and g-band (agPar) as 
a function of redshift for the quasar sample ( left panel) and as a function of magnitude for non-quasars (right panel). Left panel: While 
the changes in these astrometric parameters are not as strong as the changes in color with redshift, they provide another source of redshift 
information. Right panel: The differences between the distributions in the left panel and right panel can aid in the separation of quasars 
from non-quasars. See Section[9] For example, objects with large negative values of auPar are more likely to be non-quasars than quasars. 


Table 6 

Quasar Candidates 


Data Set Candidate Quasars w/ spectra w/o spectra 



Total 

Fraction 

Total 

Completeness 

Total 

i < 19.9 

i > 19.9 

All Candidates 

36569 

0.040 

12953 

0.980 

23616 

1570 

0.066 

22046 

0.934 

Whole Redshift Range 

33673 

0.037 

12898 

0.976 

20775 

1048 

0.050 

19727 

0.950 

Redshift Bins 

32108 

0.035 

12511 

0.946 

19597 

1282 

0.065 

18315 

0.935 

Both Methods 

29212 

0.032 

12456 

0.942 

16756 

760 

0.045 

15996 

0.955 

After WISE and WD Cut 

35820 

0.039 

12953 

0.980 

22867 

991 

0.043 

21876 

0.957 


have expected it to be higher given the additional spec¬ 
tra taken on Stripe 82 since 2005 as part of the BOSS 
program. 

Fainter than this limit, it could be that quasars are not 
being targeted or that there simply have not been enough 
fibers devoted to quasar candidates to find all of the 
objects that we consider to be valid quasar candidates. 
There are 4,591 spectroscopically-confirmed quasars dim¬ 
mer than a coadded i-band magnitude of 19.9 and with 
a redshift z < 3.0. To this we add 9,536 quasar candi¬ 
dates with astro-photometric redshift z < 3.0. There are 
561 spectroscopically-confirmed quasars dimmer than a 
coadded i-band magnitude of 19.9 and with a redshift 
z > 3.0. To this we add 576 quasar candidates with 
astro-photonretric redshift z > 3.0. 

Figure |I9| shows the completeness and new quasar se¬ 
lection as a function of redshift. The left panel shows the 
quasars and candidates for i < 19.9 and right panel shows 
i > 19.9. In short, we have shown that current methods, 
(only colors, only variability, and other techniques used 
for Stripe 82 target selection) still are incomplete. Next- 
generation surveys like LSST will have to adopt more 
sophisticated methods, of which ours is just a pilot ex¬ 
ample, to fully exploit the data. 

While we find new quasars in Stripe 82, the catalog also 
includes 466 objects that were not selected by our algo¬ 
rithms as quasar candidates, but that are spectroscop¬ 


ically confirmed quasars. This incompleteness demon¬ 
strates where there is room for improvement beyond 
our pilot project. For the sake of completeness, to il¬ 
lustrate where we may be less sensitive, and to make 
it easier to compute the completeness corrections for 
our catalog without needing another data source, these 
quasars are included in our catalog. They are indicated 
by candidate_label == 0. In general, they are in the 
densest part of the stellar locus and have very small ”/ g 
and 7 r values. More than 50% are between redshifts 
2.2 < z < 3.2 and more than are third are i > 21.5 com¬ 
pared to 5% and 9% of the quasar training set as a whole, 
making these objects particularly difficult to distinguish 
as quasars. 


8. DISCUSSION 


We will now explore the quality of the quasar catalog 
by comparing to other cuts and catalogs, in additio n to 
evaluating it for remaining contaminants. In Se ctio n |8.1 
we use the quasar variability selection box from [Schmidt 


et al. ( 2010 ) as a comparison in time domain classifica- 
tion. In Section |8.2[ we evaluate how well our algorithm 
recovers quasars from BO SS D RIP and DR12 quasar cat¬ 
alogs. Finally, in Section [8~3j we evaluate completeness 
and contamination of the candidate quasars using num¬ 
ber counts and luminosity function analysis. 
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Redshift 

Figure 14. Four selected example quasars demonstrating the photometric redshift probability function using the SDSS colors (green), 
astrometry (orange), and astro-photometric (purple). The expectation value of the SDSS colors PDF is shown as a vertical green line and 
the peak of the astro-photometric PDF is shown as a vertical purple line. The spectroscopic redshift is shown as a vertical black line.The 
colors curve is given five times the weight of the astrometry curve, then the two are added, and finally renormalized to create the purple 
curve. The top two panels demonstrate how, when the photometric redshifts returned by the colors, are inconsistent with the spectroscopic 
redshifts, the colors often return the spectroscopic redshift as one of the secondary peaks in the PDF. The astrometric PDF generally has 
several large peaks or an extended plateau. When the two PDFs are combined it often pulls out the correct peak in the colors PDF. In 
the top panel, the tertiary peak of the astro-photometric PDF correctly identifies the spectroscopic redshift for a low-redshift quasar where 
colors alone failed; a different weighting of the colors and astrometry PDFs might have picked up the correct peak. In the second panel, 
the primary peak of astro-photometric PDF identifies the spectroscopic redshift for a high-redshift quasar where colors alone failed. The 
third panel shows how the astrometry PDF helps to identify which peak in the colors PDF is correct. The bottom panel shows how a 
broad plateau in the colors PDF converges to the spectroscopic redshift by the addition of the astrometry PDF information. 
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Spectroscopic Redshift Spectroscopic Redshift 


Figure 15. Spectroscopic redshift vs. photometric redshift in hex bins with logarithmic gray scale, using ( top left panel) SDSS colors 
(both single epoch and coadded, when available), (top right panel) astrometry, ( bottom left panel) SDSS and astrometry PDFs combined, 
and ( bottom right panel) SDSS and VHS adjacent colors. This illustrates those redshifts where the algorithm has the largest error rate 
(either due to degeneracy between distinct redshifts or smearing of nearby redshifts). The bottom left panel demonstrates that when the 
photometric redshifts returned by the SDSS colors are inconsistent with the spectroscopic redshifts, the addition of the astrometry PDF 
often pulls out one of the secondary peaks in the SDSS PDF as the spectroscopic redshift. The bottom right panel demonstrates how 
optical+IR magnitudes can similarly improve the photometric redshift accuracy. However, with the addition astrometry we can surpass 
the improvements due to IR data. 
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Figure 16. Normalized histogram of the difference between spec¬ 
troscopic redshift and photometric redshift for quasars. Note how 
the distribution tightens toward Az = 0.0 from the SDSS color 
photometric redshifts to the astro-photometric redshifts. Shown 
are SDSS colors (green), SDSS colors and astrometry (purple), and 
SDSS and VHS colors (orange). Shown in solid bl ack is the his¬ 
togram of classification in redshift bins from Figure [To] 


8 . 1 . Comparison to Other Variability Based Selection 

First we compare our results to the performance of 
the (variability-based) quasar sele ction box (in A and 7 
space) defined in Equations 7 - 9 of Schmidt et al. (2010): 


7 r = 0.51og(A r ) + 0.50 
7 r = —2.01og(A r ) — 2.25 
7r = 0.055. 


(13) 

(14) 

(15) 


Using Stripe 82 data, Schmidt et al. (2010) achieve a 
completeness of 90% and an efficiency of 96% with this 
box. Applying the same cuts to our own training sets, as 
shown in Figure[20]Ze/< panel , results in 87% completeness 
and 74% efficiency. We achieved very different results 
because we ha ve very differen t quasar and non-quasar 
data sets. Schmidt et al. (2010) used quasars with 15.4 < 
i < 22.0 with a mean of 19.5 and only 5000 bright F/G- 
star colored objects with 0.2 < g — r < 0.48 and 14.0 < 
g < 20.2. We used quasars with 15.9 < i < 22.7 with 
a mean of 20.2 and 72,680 non-quasars (not just F/G 
stars) with 14.8 < g < 25.5 and a mean of 20.6. 

Applying these cuts instead to our full test set, as 
shown in Figure |20| right panel , gives 49,649 quasar can¬ 
didates. Of these, 23% are spectroscopically confirmed 
quasars and another 27% are objects that we identi¬ 
fied as quasar candidates in either Section |5.1| or |5.2| 
(with the remaining being previously-unidentificd poten¬ 
tial new candidates). If all of our previously identified 
candidates were actually quasars and the remaining ob¬ 
jects identified by these cuts were instead contaminants, 
then the efficiency of this variability quasar selection box 
would be 50% and the completeness would be 69%. The 
majority of the quasar candidates outside the box are 
dimmer than a coadded z-band magnitude of 20 , where 
most variability is below the noise level. 


This comparison suggests that selection by variability 
alone, while working well to discriminate between rela¬ 
tively bright F/G stars and quasars, is incomplete when 
using a realistic sample of non-quasar contaminants, and 
that our hybrid approach of combining color and vari¬ 
ability will yield better results for future surveys. 

In Graham et al. (2014) they compare the performance 
of variability selection using a power law fit to the struc¬ 
ture function (SF), a DRW fit, and Slepian wavelet vari¬ 
ance (SWV). Using the power law fit to the SF (SWV) 
to classify quasars on Stripe 82 they achieve 92% ( 86 %) 
completeness and 93% (92%) efficiency. 


8.2. BOSS Quasar Selection 

As described in Section |2.2| in addition to color se¬ 
lection, some of the BOSS quasars on Stripe 82 were 
targeted using an algorithm based on the same param¬ 
eterization of variability used herein. We matched our 
candidate catalog to the SDS S-III /BOSS Data Release 
10 Quasar Catalog (DR10Q; Paris et al7||2014|) to see 
how well we recovered these quasars. These quasars are 
indicated by DRlCLlabel == 1. There are 9,590 quasars 
on Stripe 82 in DR10Q and 7,241 were point sources that 
met the quality cuts to be included in our test set. Of 
these 7,241 known quasars, we recovered 7,034 (97.1% 
completeness) as candidate quasars. The quasars we 
missed have i < 22.0 with a mean of i = 20.0 and have 
7 g < 0.25: much less variable than the quasar training 
set on average. 

We found 6,562 quasar candidates in the BOSS red¬ 
shift range (2.2 < 2 < 3.5) based on astro-photometric 
redshifts. Of these, 49% are training set quasars with 
spectroscopic redshifts 2.2 < z < 3.5 (i < 22.7 with mean 
i = 20.7) and another 3% are known quasars with spec¬ 
troscopic redshifts outside this range. Of the remaining 
48% (3,157 quasar candidates), 1,614 are high probabil¬ 
ity candidates with qso_prob > 0 . 8 . These are the ob¬ 
jects that are highly likely to be quasars that BOSS has 
missed, which i s consistent with the known incomplete¬ 
ness of BOSS (Ross et al. 2012). Our high probability 
candidates have i < 23.0 with a mean of i = 21.4, sug¬ 
gesting that we are able to extend our selection to less 
luminous objects using the combined color and variabil¬ 
ity approach. 

Since our test set was built, the twelfth data release 
quasar catalog of SDSS-III was made public (DR12Q; 
Paris et al. 2015, in prep). Since DR10Q, additional 
spectroscopic plates were taken on Stripe 82, resulting 
in 2,054 DR12Q quasars on Stripe 82 that are not in 
the quasar training set, 1,162 were point sources that 
met the quality cuts to be included in our test set. We 
matched our candidate catalog to DR12Q to see how 
well we recovered these new quasars. These objects are 
indicated by DR12_label == 1. Of the quasars new in 
DR12Q, we recovered 1,141 (98.2% completeness). The 
objects that were missed have i < 22.1 with a mean of 
i = 21.3 and have 7 S < 0.33. Again, they are much less 
variable than the quasar training set on average. 


8.3. Number Counts and the Luminosity Function 
In Figure [21] we rep roduce the number cou nts analysis 


shown in Figure 9 of Richards et al. (2009aI, using our 
candidate quasars. The counts have been corrected for 
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Figure 17. Normalized histogram of sp ectro scopic redshift in panels based on bins of photometric redshift from 0.3 to 5.0 in the same 
bins as the luminosity function in Section |8.3| These panels demonstrate which photometric redshift ranges are most unreliable and most 
reliable. Photometric redshifts were calculated using SDSS colors (green), SDSS colors and astrometry (purple), and SDSS and VHS colors 
(orange). In particular, note the bimodal distribution at 0.68 < <z p hot < 1-06 compared to the precision at 1.06 < z p hot < 1-44 and 
3.0 < z p hot < 3.5. This bimodality is caused by d egene racies in color-redshift space. We correct for photometric redshift errors when 
calculating the quasar luminosity function in Section |8.3| 


incompleteness as given by the fraction of MQC quasars 
recovered as shown in Figure [21] left panel. In short, the 
correction is the ratio of known quasars to quasar can¬ 
didates. This process corrects for: objects with too few 
observations to calculate variability parameters, the ex¬ 
clusion of extended sources, and incompleteness in the 
selection algorithm. The right panel shows the num¬ 
ber of quasars per deg 2 and per 0.25 mag as a func¬ 
tion of coadded *-band magnitude. Open points repre¬ 
sent the raw number counts, while the closed points give 


the completeness-corrected number counts. The turnover 
at i = 19.9 is due to the incompleteness of the spec¬ 
troscopic sample. This analysis suggests that our selec¬ 
tion algorithm is neither heavily contaminated (e.g., as 
might be evidenced by a large excess of bright objects 
versus known quasars), nor very incomplete—since the 
corrected counts agree well with the spectroscopic quasar 
distribution. 

Next, we calculate the quasar luminosity function 
(QLF) for our candidate quasars. This QLF calcula- 
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Figure 18. Histogram of coadded i band magnitude for known 
Stripe 82 quasars and new quasar candidates. In purple are the 
previously known, spectroscopically confirmed quasars returned by 
the selection. The quasar candidates returned by the selection are 
shown in orange and the new quasar candidates are shown in green. 


tion was not intended to be a scientific result of this 
pilot project, as we expected more i ncom pleteness and 
contamination than shown in Figure |21[ However, the 
result does suggest that accurate determination of the 
QLF will be possible with photometric selection from 
LSST and other next-generation surveys. 

In order to compare space densities at different red- 
shifts, we must correct our photometry for the effects of 
redshift on the portion of the spectrum sampled by a 
given filter. We do this by using a mean K-correct ion for 


z = 2 in the Uband as described in Richards et al. (2006 
Section 5). 


As we have seen, and as discussed in Ross et al. (2013 
Section 3.4.1), variability selection is less biased than 
color selection, but we cannot assume variability selected 
samples are complete and unbiased. Just as with the 
number counts above, the candidate object QLF must 
be corrected for the completeness fraction and, addition¬ 
ally, for systematic errors in astro-photometric redshifts. 
For the QLF, we need to correct for incompleteness in 
two dimensions: redshift and absolute magnitude (lumi¬ 
nosity). The gray-scale M — z bins in the left panel of 
Figure [22] gives the fraction of MQC quasars recovered. 
This indudes quasars that were not included in our test 
set so as to correct to the true number of quasars, not 
just those that met our test set criteria. 

Since catastrophic errors in astro-photometric redshifts 
can distort the QLF, corrections were determined as fol¬ 
lows: using bins of Az = 0.1, the number of quasars 
with astro-photometric redshift in that bin was divided 
by the number of quasars with spectroscopic redshifts 
in that bin. The resulting ratio is the correction that 
needs to be applied to objects in each astro-photometric 
redshift bin to statistically account for errors in the astro- 
photometric redshift distribution (as opposed t o c orrect- 
ing individual values) and is shown in Figure 22 center 
panel. The two corrections are multiplied together and 
used as a weight for the objects in the QLF. 


We compute the QLF by binning the quasars in red- 
shift and absolute ma gnitude, usi ng the method from 
Page & Carrera (2000). Figure 22 right panel shows ab- 
solute magnitude as a function olastro-photometric red¬ 
shift for all quasar candidates. The grid shows the bins 
within which the QLF is calculated. The edges of the 
redshift bins are 0.30, 0.68, 1.06, 1.44, 1.82, 2.20, 2.6, 
3.0, 3.5, 4.0, 4.5, and 5.0. The Mi bins are in incre¬ 
ments of 0.3 mag. The adopted limiting magnitude of 
i = 22.0, is shown as a green line. The resulting i-band 
QLF is shown as black dots with Poisson error bars in 
Figure [23} 

As with the number counts, the QLF analysis shows 
relatively close agreement with the space density of 
known quasars. There is evidence for both incomplete¬ 
ness and contamination in the lowest redshift bin. This is 
perhaps not surprising given the effects the host galaxy 
has on quasar colors and apparent variability and the 
fact that we only inc lude point sour ces. We show the 


Richards et al. 


(2006 Figure 18) and Ross et al. (|2013 


Figure 11) SUSS spectroscopic QLFs in the z = 2.4, 2.8 
and 3.25 bins. This comparison reveal s that our QLF 


agrees bette r with the Ross et al. (2013) QLF. The Ross 
et al. (2013) QLF has the smaller corrections of the two 


spectroscop ic QLFs, which suggests that the | Richards] 
et al. (2006) QLF was undercorrected. In the three high- 
est redshift bins our QLF suggest s a higher space den¬ 
sity than the Richards et al. (2006) QLF. This could be a 
sign of contamination m our catalog, though it could also 
be true to some extent, given the relatively large com¬ 
pleteness fraction for candidate selection needed for the 
smaller spec troscopic sample from which the |Richards 


et al. (2006) QLF was derived. Most importantly, given 


the lack of contamination and the dependability of the 
completeness corrections, this analysis bodes well for our 
future ability to determine the QLF for faint populations 
in post-SDSS sky surveys. 

9. FUTURE WORK 

The purpose of this investigation was to demonstrate 
that using a combination of optical colors and variability 
parameters improves quasar classification efficiency and 
completeness over the use of colors alone. This is one 
step toward finding an optimal strategy for photometric 
quasar selection. 

In future, we hope to use a data set that includes both 
point sources and extended sources, thus incorporating 
the variable nucleus with the steady host galaxy. Ad¬ 
ditionally, we plan to explore alternative parameteriza- 
tions of quasar variability. The underlying mechanism 
and most appropriate model for quasar variability re¬ 
main open q uestions and there are more sophisticated 


models (e.g. Kelly et al. 2009 MacLeod et al. 2010 
Kasliwal et al. 20151 that merit exploration. Given the 


large quantity of data expected in future surveys such as 
LSST, a more computationally efficient approach than 
the structure fun c tion may become impo rtant ; e.g., the 


Kell y et al. (2009), Kozlowski et al. 
et al.l (|2010|) approaches require on 


(2010), and MacLeod 
ily (J(N) rather than 


0(NQ operations to determine the model parameters for 
a lig ht c urve with N data points. As described in Sec¬ 
tion |2.4| the likelihood method is biased and more ro¬ 
bust ap proaches such as those d escr ibed in the appen- 
dices of Kozlowski et al. (2010) or Hernitschek et al.l 
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Figure 19. Stacked histogram of redshift for known Stripe 82 quasars and new quasar candidates. Left panel shows the quasars and 
candidates i < 19.9 and right panel shows i > 19.9. Spectroscopic quasars found as quasar candidates and spectroscopic quasars missed 
are both binned by spectroscopic redshift. Quasar candidates found by both methods and quasar candidates found only using a binned 
quasar training set are both binned by where the candidate was classified with the highest probability. These bins only span 0.4 < z < 4.0. 
Quasar candidates found only using a quasar training set over the full redshift range are binned by the astro-photometric redshift. 




Figure 20. A r vs . 7r for the training sets (left panel) and test set (right panel) shown with the|Schmidt et al.|(|2010|) variability selection 
cuts (Equations |13| - |15| ) as gray lines. Left panel: Orange contours show the non-quasar training set and purple contours and scatter points 
show the quasar training set. Right panel: Gray contours show all objects in the test set classified as non-quasars and green contours and 
scatter points show all objects in the test set classified as quasars. 


(2015) should be investigated. Currently, we use vari¬ 
ability data from each band separately. We hope to ex¬ 
plore the various methods for merging bands together, 
even with non-simultaneous observations as will be the 
case with LSST. 

This work relies on KDE for classification and it is im¬ 
portant to explore other methods to see if they will be 
more successful. In the future, we hope to make use of 


2003 


2012 

Chakraborty et al. 2013), such as random forests 

(e-S- 

Cao et al. 

2009 

Richards et al. 2011 

Carrasco Kind 

& Brunner 

20 

L3), gradient boosting machines (Hastie 


e t al. |2001). and Bayesian classification with hash tables 
ICupI net al.| [2014). Additionally, in the future our cat¬ 
alogs will not have binary classifications, but will simply 
give probabilities for all objects. 

We have used the combination of optical and mid- 
infrared (MIR) colors for quasar selection in another pa¬ 
per (Richards et al. 2015, submitted). In the future we 
will combine optical, IR, and variability data to produce 
the most complete and efficient catalog possible. 

In order to improve the astro-photometric redshift es¬ 
timations, we will multiply smoothed PDFs instead of 
adding by weights. Additionally, we will incorporate the 
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Figure 21. Left panel: Ratio of quasars in the MQC on Stripe 82 to those MQC quasars returned by classification using a training set 
over the full redshift range. This allows us to correct for objects with too few observations to calculate variability parameters, the exclusion 
of extended sources, and incompleteness in the selection algorithm. The fraction is given as a function of coadded z-band magnitude for two 
redshift ranges. Right pane l: Quasar num b er coun ts as a function of redshift and z-band magnitude. Black points give the spectroscopic 
number counts reported in|Richards et al.| (|2009a|); circles for 2 : < 2.2 and triangles for 3 < 2 : < 5. The open purple and green squares 
give the raw number counts (with Poisson error bars) for the candidates reported here. The filled colored squares give the number counts 
corrected using the left panel. The vertical dashed red line at z = 19.9 indicates the target selection depth for low-redshift on Stripe 82. 
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Figure 22. Corrections and cuts used in the QLF in Figure |23| Le ft panel: Completeness fraction, in bins of redshift and absolute 
magnitude, Mi\z = 2], for candidate selection. Similar to Figure [21] left panel , but in two dimensions. The number of quasars with 
spectroscopic redshifts on Stripe 82, even if they were excluded from our training set and test set, was divided by all quasars with 
spectroscopic redshifts that were recovered as candidate quasars. This is to correct for incompleteness from too few observations to calculate 
variability parameters, the exclusion of extended sources, and incompleteness in the classification algorithm. Center panel: Completeness 
fraction for astro-photometric redshifts. All of the training set quasars are binned by spectroscopic redshift (purple) and astro-photometric 
redshifts (green). The ratio of the two is shown in grey (right axis). The astro-photometric redshifts of the candidate quasars, after being 
corrected by the completeness fraction and assuming that objects without spectroscopic redshifts have the same astro-photometric redshift 
errors as those with spectroscopic redshifts, are shown in black. Right panel: Astro-photometric redshift vs. absolute magnitude, Mi[z = 2], 
of all quasar candidates. The green line shows the brightness limit for bins that are used in computing the luminosity function. Purple 
curves show the i = 15.0, 19.1, and 20.2 magnitude limits for SDSS spectroscopic follow-up. 


clustering redshift es timation of Menard et al. (20131 and 


Rahman et al. (20151 and explore photometric and astro- 
photometric redshift accuracy without u band observa¬ 
tions (to mimic DES and Pan-STARRS observations). 
Finally, we will explore how simultaneous color and vari¬ 
ability classification per forms using other ti me-dom ain 


surveys, includin g DES (The D ark Energy Survey Col- 
laboral ion|2005 ), Pan- STARRS (Kaiser c t_al.|2010| ), and 


LSST simulated data (Connolly et al.|[2014p. 


10. CONCLUSIONS 

Using the Non-parametric Bayesian Classification Ker¬ 
nel Density Estimation (NBC KDE) quasar selection al¬ 
gorithm, we demonstrated that using a combination of 
optical colors and variability parameters improves quasar 
classification efficiency and completeness over the use 
of colors alone. For classification using colors alone, 
there are redshift ranges with poor completeness where 
the quasar and non-quasar training sets overlap in color 
space. Variability alone does not have these redshift 
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Figur e 23. Mi[z = 2] binned luminosity function of the sample with astro-photometric redshifts using the method from|Page & Carrera| 
(|2000| (with Poisson error bars). The mean redshift of each slice is given in each panel. Black fil led circles are complete bins, empty 
triangles indicate the lower limit for complete bins where the completeness fraction (shown in Figure |22| left panel) is 0, and empty circles 
are partial bins (a portion of the bin is d immer t han i = 22). The grey circles show the binned luminosity function and the grey dashed 
line shows the z = 2.01 curve both from |Richards et al.| (|2006| Figure 18) for comparison . In the z = 2.4, 2.8, and 3.25 panels, the red 
squares show the binned luminosity function tor BOSS quasars from DR9 from|Ross et al.|(|2013| Figure 11). In the 4.75 panel, the green 
squares, purple squares, and dashed black line show the binned lumin osity function at z = 4.y lor Stripe 82, DR7, and double power law 
fits from the maximum likelihood analysis from M cGreer et al.| ( |2013[ Figure 12 and Figure 13). 
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trends, but it has a lower efficiency than coadded col¬ 
ors at all other redshifts. The addition of coadded colors 
to variability information improves classification at all 
redshifts. Using variability alone, colors alone, and com¬ 
bining variability and colors we achieve we achieve 91%, 
93%, and 97% quasar completeness and 98%, 98%, and 
97% efficiency respectively, with particular improvement 
in the selection of quasars at 2.7 < z < 3.5, as shown in 
Figure [7j 

We classified quasars and estimated their redshifts 
simultaneously by limiting the training set to non¬ 
overlapping redshift bins from 0.4 to 4.0 with a bin width 
of 0.2. We successfully classified known quasars into the 
correct redshift bins with 75% or higher completeness, 
depending on the redshift bin, as shown in Figure [9] 

Overall, we identified 35,820 type 1 quasar candidates 
in the SDSS Stripe 82 field using the combination of op¬ 
tical photometry and variability either over the full red¬ 
shift range or within one of the redshift bins. Of the 
13,221 spectroscopically confirmed quasars that could 
have been returned, we found 12,953 (98.0% complete¬ 
ness). Of the 22,867 quasar candidates that are not spec¬ 
troscopically confirmed, 21,876 (95.7%) are dimmer than 
a coadded i-band magnitude of 19.9. Figure [Ts] shows the 
magnitude distribution of the candidate quasars. 

Photometric redshift estimates of these candidates us¬ 
ing optical photometry and astrometric parameters are 
accurate to within |Az| < 0.1 for 51.6% of quasars and 
within 0.3 for 76.8% of quasars. The combination of op¬ 
tical photometry and astrometry makes the photomet¬ 
ric redshifts more accurate when colors alone returns the 
correct redshift as one of the secondary peaks in the PDF. 
The astrometric PDF pulls out the correct peak in the 
color PDF, as shown in Figure [14} We find that objects 
with photometric redshifts of z ~ 1.25 and z ~ 3.25 are 
particularly robust. 

In Figure |20[ our color and variability selection was 
compared to other cuts in variability space that have 
been used on Stripe 82. We demonstrated that variabil¬ 
ity alone is incomplete and that our hybrid approach 
will yield better results for future surveys. Additionally, 


we have shown that our selection recovered 97% of the 
quasars in the DR12 quasar catalog and we selected ad¬ 
ditional candidates in the BOSS redshift range with high 
confidence (and at even higher redshift). 

We used MIR color cuts to remove a small number 
of bright star contaminants from our final candidate list. 
Our number counts and quasar luminosity function anal¬ 
yses, shown in Figures [2T] and [23j show there is little con¬ 
tamination remaining and that there is relatively close 
agreement with the space density of known quasars. 

From the NBC KDE classification test set, we present 
a catalog of known quasars and candidate quasars on 
Stripe 82. The catalog is available as a FITS file online. 
Future work along these lines will be needed to capitalize 
on the imaging data produced by Pan-STARRS, DES, 
and LSST. 
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APPENDIX 


CATALOG COLUMNS 

In Section [7] we created a catalog of quasar candidates. All of the columns in the catalog are described in Table [7J 
but a few columns need some extra explanation. 

Columns 4 to 13 are the single epoch magnitudes and magnitude errors used for the single epoch classification. 
We used a randomly chosen epoch from the observations for each object. The single epoch magnitudes are asinh 
magnitudes from Lupton et al. (|l999). Columns 14 to 18 are the coadded magnitudes and magnitude errors. The 
coadded magnitudes are front Annis et al. (2014). Columns 19 to 24 are the VHS magnitudes and magnitude errors 
in Vega. Columns 25 to 28 are the WISE magnitudes and magnitude errors in AB. 

Columns 30 to 3 4 are labels. Specifically, column 30 is the candidate label: if the object was classified as a quasar in 
either Section [54~| (over the whole redshift range) or 5.2 (in redshift bins) the value is 1, otherwise it is 0. Column 31 
is the MQC label: if the object is in the catalog the value is 1, otherwise it is 0. Column 32 is the DR10Q label: if the 
object is in the catalog the value is 1, otherwise it is 0. Column 33 is the DR12Q label: if the object is in the catalog 
the value is 1, otherwise it is 0. Column 34 is the WISE cut label: if the object is cut by Eqs. 10 and [ll] the value is 
1, otherwise it is 0. Column 35 is the white dwarf cut label: if the object is cut by Eq. 12 the value is 1, otherwise it 
is 0. To retrieve the quasar candidates that pass these cuts (the “good” quasar candidates) perform this query on the 
catalog: WISEcut_label == 0 & WDcut_label == 0 & candidate.label == 1. To limit to the new candidates (not 
spectroscopically confirmed quasars) add: & zspec < 0. 


16 astropy.org 

17 starlink.ac.uk/topcat 

18 starlink.ac.uk/stilts 


19 matplotlib.org 

20 github.com/CKrawczyk/densityplot 
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Table 7 

Column Names 


index 

Name 

Description 

i 

id 

SDSS Coadded ParentID 

2 - 3 

ra, dec 

coadded right ascension, declination 

4 - 8 

u, g, r, i, z 

single epoch magnitude 

9 - 13 

uErr, gErr, rErr, iErr, zErr 

single epoch magnitude error 

14 - 18 

coadd_u, coadd_g... 

coadded magnitude 

19 - 21 

J, H, KS 

VHS magnitude 

22 - 24 

JErr, HErr, KSErr 

VHS magnitude error 

25 - 26 

Wl, W2 

WISE magnitude 

27 - 28 

WIErr, W2Err 

WISE magnitude error 

29 

zspec 

spectroscopic redshift, if none, the value is -9999. 

30 

candidate-label 

1 if selected as a quasar candidate, 0 otherwise. 

31 

MQCJabel 

1 if in MQC, 0 otherwise. 

32 

DR10Q_label 

1 if in DR10Q, 0 otherwise. 

33 

DR12Q_label 

1 if in DR12Q, 0 otherwise. 

34 

WISEcuUabel 

1 if cut by Eqs. 110|andll 1| 0 otherwise. 

1 if cut by Eq. 11"2|0 otherwise. 

35 

WDcut_label 

36 - 45 

A_u, gamma.u, A_g, gamma_g... 

variability parameters, if none, the value is -9999. 

46 - 47 

auPar, agPar 

astrometry parameters, if none, the value is -9999. 

48 

qso_prob 

probability of being a quasar 

49 

star _dens 

star density 

50 

qso_dens 

quasar density 

51 

qso_prob_bins 

vector - probability of being a quasar 

52 

star_dens_bins 

vector - star density 

53 

qso_dens_bins 

vector - quasar density 

54 

qso_prob_max 

maximum value of qso_prob_bins vector 

55 

qso_prob_max_bin 

redshift bin of maximum value of qso_prob_bins vector 

56 

photoz_ugriz_pdf 

vector - full photo-z PDF, SDSS colors 

57 

phot oz _ugr iz _low 

low redshift end of the peak in photo-z PDF, SDSS colors 

58 

phot oz _ugr iz _b es t 

peak of photo-z PDF, SDSS colors 

59 

phot oz _ugriz -high 

high redshift end of the peak in photo-z PDF, SDSS colors 

60 

photoz_ugriz_prob 

probability of photo-z, SDSS colors 

61 

photoz_astrometry_pdf 

vector - full photo-z PDF, astrometry 

62 

photoz.astrometryjow 

low redshift end of the peak in photo-z PDF astrometry 

63 

phot oz .astrometry-best 

peak of photo-z PDF astrometry 

64 

photoz_astrometry_high 

high redshift end of the peak in photo-z PDF astrometry 

65 

phot oz .astrometry _prob 

probability of photo-z astrometry 

66 

photoz_added.pdf 

vector - full photo-z PDF, SDSS colors and astrometry 

67 

photoz_added_max 

max of photo-z PDF, SDSS colors and astrometry 

68 

photoz_added_max.bin 

redshift bin of maximum value of photo-z PDF, SDSS colors and astrometry 

69 

photoz_ugrizJHK.pdf 

vector - full photo-z PDF, SDSS and JHK colors 

70 

photoz_ugrizJHK_low 

low redshift end of the peak in photo-z PDF, SDSS and JHK colors 

71 

photoz_ugrizJHK_best 

peak of photo-z PDF, SDSS and JHK colors 

72 

photoz_ugrizJHK_high 

high redshift end of the peak in photo-z PDF, SDSS and JHK colors 

73 

photoz.ugrizJHK.prob 

probability of photo-z, SDSS and JHK colors 

74 

gi_sigma 

g-i color offset from the mean color 

75 

SDSSSPECMATCH 

1 if the object had a spectrum from the original SDSS, 0 otherwise 

76 

BOSSSPECMATCH 

1 if the object had a spectrum from BOSS, 0 otherwise 

77 

DR12QSOMATCH 

1 if the object is visually inspected as a quasars in the DR12Q, 0 otherwise 

78 

ZSDSS 

pipeline redshift from SDSS 

79 

CLASSSDSS 

pipeline classification from SDSS 

80 

ZBOSS 

pipeline redshift from BOSS 

81 

CLASSBOSS 

pipeline classification from BOSS 

82 

DR12QSO_Z_VI 

redshift of the quasars if included in the DR12Q 


Columns 48 to 55 are the various classification results. Speci fical ly, columns 48 to 50 are the results of classifying 
the test set over the full redshift range as described in Section |5.1| Column 48 is the probability of being a quasar, 
column 49 is the star density from the KDE ( P(D\M )), and column 50 is the quasar density from the KDE. If the 
object was not found to be a candidate over the full redsh ift ra nge the value is -9999. Columns 51 to 55 are the results 
of classification using redshift bins as described in Section [572] Column 51 is the probability of being a quasar, column 
52 is the star density, and column 53 is the quasar density. Each is a vector with 18 cells, one for each redshift bin 
from 0.4 to 4.0. If the object was not found to be a candidate in any bin, all cell values are -9999. Column 54 is 
the maximum value of column 51, and column 55 is the center of the redshift bin corresponding to that maximum 
probability. If the object was not found to be a candidate in any bin these columns will be -9999. 

Columns 56 to 73 are the various redshift estimation results. If we were unable to calculate a redshift estimate for 
the object the value will be -9999. Column 74 indicates whether the object’s g — i color is within la (0.68), 2 a (0.95), 
or 3cr (0.99) of the mean color for quasars at the astro-photometric redshift. Outliers are an indication of either bad 
estimated redshifts or non-quasar contaminants. Columns 75 to 82 are the details of matching to all spectra taken on 
SDSS Stripe 82. 
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